Meta has introduced Movie Gen, a groundbreaking suite of generative AI models designed to revolutionise media creation across image, video, and audio modalities.

Movie Gen represents Meta's third wave of generative AI research, combining multiple modalities and enabling unprecedented fine-grained control for users. The new AI system boasts four key capabilities: video generation, personalised video generation, precise video editing, and audio generation.

The video generation component, a 30B parameter transformer model, can create high-quality, 16-second videos at 16 frames per second from text prompts. For personalised video generation, the system can produce videos featuring specific individuals based on a single image and text description. The precise video editing function allows for targeted modifications to existing videos using text commands. Lastly, the 13B parameter audio generation model can synthesise high-fidelity audio, including ambient sound and music, for videos up to 45 seconds long.

Meta claims that Movie Gen outperforms similar industry models across these tasks in human evaluations. The company trained the models on a combination of licensed and publicly available datasets.

While acknowledging current limitations, such as inference time and potential for further quality improvements, Meta envisions wide-ranging applications for Movie Gen.

The company has published technical details in a research paper. However, no specific timeline for a public release of Movie Gen has been announced.



Share this post
The link has been copied!