TikTok (ByteDance)’s New AI Animator Is Mind-Blowing

AI video generators have recently dominated tech headlines, especially following OpenAI’s announcement of Sora, their first video model that can generate jaw-dropping AI videos with simple text prompts.
Today, ByteDance, the company that made TikTok, is getting in on the action too. They’ve created Boximator, which lets you turn static pictures into videos.
What is Boximator?
Boximator combines “box” and “animator” to describe its function: animating objects within videos using user-defined boxes. This tool aims to give users control over how objects move in a video, offering a mix of hard and soft boxes for motion control.

Hard boxes allow for precise object outlines, while soft boxes enable more fluid motion paths.
In the example above, all bounding boxes are projected to the cropped region (white dashed box).
How Boximator works
Here are the video generation steps:
- For every clip in the dataset, the first frame is taken to generate an image description using a visual language model.
- Then they extract noun chunks from these descriptions, say “young man” or “white shirt.”
- These prompts are fed to a pre-trained grounding model and object tracker to generate bounding boxes and populate them across all frames of the video.

The full architecture model of Boximator is illustrated below.

In every spatial attention block of video diffusion models, there are two stacked attention layers: a spatial self-attention layer and a spatial cross-attention layer.
Full details of how this works are described in this whitepaper.
The training dataset
Contrary to images, there aren’t a lot of publicly available video datasets with object tracking annotations. The engineers curated their training set from the WebVid-10M dataset.
WebVid-10M is a large-scale dataset of short videos with textual descriptions sourced from stock footage sites. The videos are diverse and rich in their content.
- 10.7 million video-caption pairs.
- 52K total video hours.
Example videos
Here are some incredible examples:
Left: “The kitten is hiding herself into the cup”
Right: “A dog is chasing a red ball.”


Left: “A young woman is turning her head, revealing her face in profile.”
Right: “A man sitting on a table is drinking a cup of coffee.”


Right: “A dog is chasing a red ball.”
Comparison with other AI video generators
The examples below are comparisons against two of the most popular AI video generators, Pika 1.0 and Runway Gen2.
Note: Pika and Gen-2 use image and text conditions; Boximator uses additional box constraints derived from the text prompt.
Prompt: “Adding wine to a glass.”

Prompt: “A handsome man is taking out a rose from his pocket with his right hand and looking at the rose.”

Prompt: “Two raccoons in blue shirts are playing a ball, the left one is jumping up.”

What do you think of these videos?
Looking at these examples, it’s evident that adding an additional control layer enhances the results. The video generated by Boximator is more dynamic than the ones from Pika and Gen2.
How to try
The demo website is currently not available to the public. According to its creators, it should be available in the next couple of months.
Our demo website is under development and will be available in the next 2–3 months. We will attach the demo link on this website once the demo is ready.
If you really want to try Boximator, you can email the creators at [email protected], send them the input image and the text prompt, and then they will reply with the generated video.
Final Thoughts
As a tech enthusiast, I feel excited to see tech giants showcase pieces of software like Boximator and Sora that could be accessible to our fingertips in the near future.
However, it is important to be aware of the risks associated with this technology. As with any powerful tool, there is the potential for misuse. Deepfakes, for example, could be used to spread misinformation or propaganda.
It is important to consume online media responsibly and to be critical of the information you see.

This story is published on Generative AI. Connect with us on LinkedIn and follow Zeniteq to stay in the loop with the latest AI stories. Let’s shape the future of AI together!







