They successfully use food as a universal language to connect the East and the West, making global cultures accessible to millions of viewers.
If you are considering joining the platform (or building a similar channel on YouTube or TikTok under the #Jollyvids trend), here are the defining characteristics:
– Possibly user-generated video sharing, short-form comedy clips, or reposted viral content (similar to early Break or eBaum’s World). No verified traffic or ownership data is publicly available.
Originally filmed in Ollie’s living room, the channel’s success allowed them to hire a small team (including members like Mike and Grace) and produce professional, long-form content specifically optimized for television viewing. Signature Content Series
We present , a curated collection of > 1.2 million short video clips (average length ≈ 7 seconds) spanning 150 semantic categories, sourced from open‑license platforms. Each clip is paired with high‑quality textual captions, temporally aligned audio transcripts, and fine‑grained action annotations. JollyVids is designed to address three shortcomings of existing video corpora: (1) limited semantic diversity, (2) poor alignment between visual and linguistic modalities, and (3) insufficient scale for training modern transformer‑based video‑language models. We provide extensive baseline experiments on video‑text retrieval, zero‑shot video classification, and video captioning, demonstrating that models pretrained on JollyVids outperform those trained on previous datasets by 4–12 % on standard downstream benchmarks.
They successfully use food as a universal language to connect the East and the West, making global cultures accessible to millions of viewers.
If you are considering joining the platform (or building a similar channel on YouTube or TikTok under the #Jollyvids trend), here are the defining characteristics: jollyvids.
– Possibly user-generated video sharing, short-form comedy clips, or reposted viral content (similar to early Break or eBaum’s World). No verified traffic or ownership data is publicly available. They successfully use food as a universal language
Originally filmed in Ollie’s living room, the channel’s success allowed them to hire a small team (including members like Mike and Grace) and produce professional, long-form content specifically optimized for television viewing. Signature Content Series Originally filmed in Ollie’s living room, the channel’s
We present , a curated collection of > 1.2 million short video clips (average length ≈ 7 seconds) spanning 150 semantic categories, sourced from open‑license platforms. Each clip is paired with high‑quality textual captions, temporally aligned audio transcripts, and fine‑grained action annotations. JollyVids is designed to address three shortcomings of existing video corpora: (1) limited semantic diversity, (2) poor alignment between visual and linguistic modalities, and (3) insufficient scale for training modern transformer‑based video‑language models. We provide extensive baseline experiments on video‑text retrieval, zero‑shot video classification, and video captioning, demonstrating that models pretrained on JollyVids outperform those trained on previous datasets by 4–12 % on standard downstream benchmarks.