Michael Nuñez at Venture Beat:
ByteDance researchers have developed an AI system that transforms single photographs into realistic videos of people speaking, singing and moving naturally — a breakthrough that could reshape digital entertainment and communications.
The new system, called OmniHuman, generates full-body videos that show people gesturing and moving in ways that match their speech, surpassing previous AI models that could only animate faces or upper bodies.
Video Player00:0000:00
“End-to-end human animation has undergone notable advancements in recent years,” the ByteDance researchers wrote in a paper published on arXiv. “However, existing methods still struggle to scale up as large general video generation models, limiting their potential in real applications,”
The team trained OmniHuman on more than 18,700 hours of human video data using a novel approach that combines multiple types of inputs — text, audio and body movements. This “omni-conditions” training strategy allows the AI to learn from much larger and more diverse datasets than previous methods.
More here.
Enjoying the content on 3QD? Help keep us going by donating now.