Video | Mona Lisa raps...but how?

2024-04-25 2024-04-25T12:29:04Z
ندى ماهر عبدربه
ندى ماهر عبدربه
صانع مُحتوى

ArabiaWeather - A team of scientists at Microsoft Research Asia has developed a new artificial intelligence model called VASA-1, which turns images of people's faces and audio clips into synchronized videos with lip movements, facial expressions, and head movements in an accurate and realistic manner.

In a research paper, the team stated that they presented the VASA framework, which enables the creation of lifelike talking faces with attractive visual emotional skills from a single image and speech audio clip. The first model, VASA-1, is distinguished by its ability to generate exquisite lip movements in sync with sound, In addition to capturing a wide range of nuances in facial expressions and natural head movements that contribute to the authenticity and liveliness of the video.

The team claims that their method not only delivers high video quality with realistic face and head dynamics, but also supports online creation of 512 x 512 videos at up to 40 frames per second with almost negligible latency.

Video | A Saudi airline employee becomes a trend... What's the story?

Singing the Mona Lisa and fears of impersonation

VASA, or Visual Affective Skills Animator, is a name that stands for “Visual Affective Skills Animator,” and is capable of creating realistic videos that accurately and realistically mimic human conversational behaviors.

The VASA model can create videos that look completely real, with “realistic talking faces” mirroring conversational behaviors through natural facial gestures, eye and head movements, all starting from a single static head image.

The team used the VoxCeleb2 dataset, which includes videos of thousands of real-life celebrities, to train their model.

Their model was distinguished by its ability to deal with diverse inputs outside the training domain, such as artistic images and non-English speech.

While the model's capabilities raise impersonation concerns, the scientists stress that their goal with the tool is to develop virtual characters' visual emotional skills, not to impersonate anyone in the real world.

Microsoft confirms that there are currently no plans to release the code supporting the model, and aims to use the technology responsibly and in accordance with appropriate regulations in the future.

Read also:

China is drowning in dust... How so?

On World Earth Day, frequently asked questions about...


Sources:

Interesting Engineering

This article was written originally in Arabic and is translated using a 3rd party automated service. ArabiaWeather is not responsible for any grammatical errors whatsoever.
See More
Related News
Apple unveils the largest update for iPad devices, learn about it

Apple unveils the largest update for iPad devices, learn about it

Why does America want to ban the TikTok application?

Why does America want to ban the TikTok application?

Saudi Arabia - 3:30 pm | Thunderstorms affect parts of the southwest of the Kingdom at this time

Saudi Arabia - 3:30 pm | Thunderstorms affect parts of the southwest of the Kingdom at this time

Flying taxis to transport the guests of God and drones during the 1445 Hajj season

Flying taxis to transport the guests of God and drones during the 1445 Hajj season