Sakuga-42M Dataset: Scaling Up Cartoon Research

1University of Alberta, 2Sichuan Conservatory of Music
Interpolation end reference image.

Sakuga-42M is the first large-scale cartoon animation dataset.

Abstract

Sakuga-42M Dataset is the first large-scale cartoon animation dataset. it comprises 42 million keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. We pioneer the benefits of such a large-scale cartoon dataset on comprehension and generation tasks by finetuning contemporary foundation models like Video CLIP, Video Mamba, and SVD, achieving outstanding performance on cartoon-related tasks.

Our motivation is to introduce large-scaling to cartoon research and foster generalization and robustness in future cartoon applications. Dataset, Code, and Pretrained Models will be publicly available.

Pipelines

Our automatic preparation pipeline of Sakuga-42M. Cartoon videos are first collected, then are split into smaller clips which comprise only keyframes. Incorporating domain knowledge from anime expert models and general domain captioning tools, the pipeline enhances LLM to provide richer descriptions(in orange color) of cartoon clips.

Composition

Sakuga-42M primarily comprises clips with key frame durations within 96 frames, emphasizing a high proportion of aesthetic value and dynamic score. To better categorizing the clips, we provide additional content-based taxonomy based on anime tags. Our dataset surpasses the combined size of all previous cartoon datasets, paving the way for large-scale models.

Interpolation end reference image.

Distribution

Sakuga-42M reveals the differences in data distribution between natural data and hand-drawn cartoons. While different natural datasets overlap, Sakuga-42M forms a distinct cluster, highlighting its unique features.

Interpolation end reference image.

BibTeX

@article{sakuga42m2024,
    title   = {Sakuga-42M Dataset: Scaling Up Cartoon Research},
    author  = {Zhenglin Pan, Yu Zhu, Yuxuan Mu},
    journal = {arXiv preprint arXiv:2405.07425},
    year    = {2024}
}