Want to turn photos into talking, lifelike video? Try this AI platform

AI video platform D-ID converts still images into videos, a huge next step in the future of synthetic media

 Yaniv Levi, VP product marketing at D-ID (photo credit: Eyal Regev)
Yaniv Levi, VP product marketing at D-ID
(photo credit: Eyal Regev)

When people think about artificial intelligence, they rarely imagine the technology being used to sift through complex data sheets or find out how many people buy something because of a billboard, or figure out when a dog has sniffed cancer cells. That’s typically the kind of thing AI is being used for these days – and while they’re all cool, the common Dick and Jane probably aren’t getting all too hyped up about it.

However, hope is not lost for dreamers wishing for a Bradbury-esque future of machines creating things that are cool, even to the layman. There exists a growing field in AI technology devoted to “synthetic media” – art, content and creative materials that have been produced by an artificially intelligent creator.

The current buzz in synthetic media is centered around AI image generation, with platforms such as DALL-E, CrAIyon and Midjourney leading the pack in the creation of art based on text prompts. Israeli start-up D-ID is the pioneer of a slightly different spin on the idea: taking a still photo of someone and turning it into a talking video.

What does D-ID do?

Initially known for its “Deep Nostalgia” collaboration with MyHeritage a few years back – the companies worked together to offer users lifelike videos using a photo of their deceased relatives – D-ID has launched its proprietary Creative Reality Studio, a self-service video platform that enables users to “bring photos to life and seamlessly generate high-quality and customized presenter-led content from a single image.”

What that means practically is that the company can take any still image of a thing with a face – a human, a statue, even some monkeys – and turn it into a video wherein the subject talks, with either an AI-generated voice using a pre-written script or an audio recording.

  (credit: INGIMAGE) (credit: INGIMAGE)

The initial goal of the company is to offer the service to corporations with training programs, enabling them to add a front-facing human element to their lessons. Several companies are already on board, including SkillDora, an e-learning platform that delivers courses exclusively by AI instructors, and Japanese e-learning company Skill Plus.

“D-ID’s work has already generated more than 100 million videos,” said Gil Perry, CEO and co-founder of D-ID. The new platform enables “larger enterprises, smaller companies and freelancers to produce personalized videos for a range of purposes at a massive scale, with the potential to engage audiences in learning and development, sales training and more,” he said. “Our technology cuts through the headache of corporate video production to effortlessly create high-quality, cost-effective, professional videos in any language at the click of a button.”

Ethics, anyone?

Following the emergence of DeepFake technology (a different method of AI video creation that alters a video subject’s face to look like someone else’s), synthetic video technology is often accompanied by questions regarding ethics: “How do we prevent fake news? How do we make sure that these technologies aren’t abused? How do I fact-check this video of deceased professional wrestler Macho Man Randy Savage announcing his plans to trade 18 Slim Jims for a Tesla?”

For the most part, these are fair questions, and an ethical obligation falls on the creators of these technologies to take them into account. In an interview with The Jerusalem Post, D-ID’s vice president for product marketing, Yaniv Levi, elaborated on the lengths that the company has gone to ensure that its platform can’t be abused – at least not easily.

“To say that we’ve considered it is an understatement,” he said. “First of all, we’ve blocked the use of celebs, famous people, etc., with moderation tools. In addition to that, when talking about corporate/enterprise customers, we have terms of use in the contract that they’re signing, [whereby] they commit to a very long list of things that they are not going to do with our technology. It includes political uses, sexually offensive content – anything related to that is completely forbidden.”

“we’ve blocked the use of celebs... We have terms of use in the contract that they're signing, they commit to a very long list of things that they are not going to do with our technology. It includes political uses, sexually offensive content — anything related to that is completely forbidden.”

Yaniv Levi

The next step for synthetic media

According to Levi, D-ID is currently focused on the corporate/enterprise space, but in the future, there are several exciting avenues to explore for the use of the company’s technology. It has worked out emulating lifelike head movements and, in some cases, hand gestures to complement the supplied audio, and future iterations of the technology could include full-body animation.

While copyright concerns may prevent them from using that tech to create custom Seinfeld episodes, it still carries a lot of potential for use within the metaverse.

“Maybe it will take three, four or five years, but each one of us will eventually have their own digital representation of themselves in the metaverse,” Levi said. “We see ourselves as the company that will provide super-realistic avatars, like full-body reenactment, based on just a facial photo.”