When you’re on social media a lot today, you’ve undoubtedly seen a serious uptick in examples of some fairly superb AI Generated movies popping out, and the way creatives are making enjoyable initiatives and even commercials and brief movies with these instruments. It’s actually an thrilling time for content material creators and producers to interact and assist form the best way we make the most of these new instruments and manufacturing workflows.

So now let’s take a fast take a look at an up to date function I’ll be sharing from HeyGen’s new Avatar IV, an AI Picture to Video instrument, utilizing my very own headshot photograph and cloned voice:
Positive, it nonetheless has a little bit of exaggeration across the mouth actions and physique, nevertheless it’s nonetheless an enormous enchancment over earlier variations and most all different Picture to Video explainer video instruments. See extra about this expertise later within the article.
HeyGen Video Avatars
When you’ve been following my articles over the previous couple years, you’d know that I’ve been protecting the developments that HeyGen has been creating with their AI Video Avatars capabilities.
I created this primary take a look at with two completely different angles in opposition to a inexperienced display to create two completely different “Immediate Avatars” in HeyGen (I didn’t pay further for the extra detailed professional variations). I used ElevenLabs to provide the voice audio file and used that in each digicam angles after which composited them every in Adobe After Results individually and rendering the outcomes as two separate information. I then edited them collectively in Adobe Premiere Professional like I might recurrently recorded video footage. I purposely left dissolves in so you possibly can see how the 2 angles are synced up with the voice audio monitor.
First cross is the uncooked inexperienced display video clips that got here straight out of HeyGen, after which a fast composite in After Results with the fake “TEDx” stage backgrounds.
HeyGen additionally gives tons of Inventory Avatars you should use, together with some with pre-generated expressions built-in and a few with a number of digicam angles.
For this take a look at, I chosen an “Expressive” avatar, which I used a voice audio monitor created in ElevenLabs, because the default voices provided in HeyGen weren’t nice for this mannequin.
I need to say that the ensuing video is kind of plausible. I solely hope that Customized Avatars can ultimately be generated this believably within the close to future.
For some extra examples of HeyGen’s Inventory Avatars with 2-cam edits. I ran the textual content by ElevenLabs AI to get the voiceovers, then put them into the venture for each digicam views. This provides you high quality management of the audio and assures it’s completely synched while you edit. That’s why I put all of the loopy B-cam fades in these examples to indicate of the synchronization from every cam and the audio voice tracks.
Some inventory avatar fashions are extra plausible than others, as you’ll see in these examples. You must select what’s proper in your productions after all, as some are fairly stiff of their supply and a few are overly-expressive and have a more durable time aligning with the voice content material, such because the final mannequin on this video.
HeyGen Avatar IV
HeyGen has been creating it’s AI Picture to Video capabilities to one thing actually fairly exceptional and helpful.
Utilizing solely considered one of their inventory photographs, I attempted a fast take a look at with an audio script I created in ElevenLabs and imported it with the instrument. I used to be fairly blow-away with the outcomes!
Choosing the inventory picture from HeyGen’s library choices:
Importing the recorded AI voice audio from ElevenLabs:
Right here’s the ensuing video:
I can completely see this as an effective way to personalize service and gross sales messaging with out the necessity to document video for the productions or to clone for a Video Avatar. Aside from the exaggerated mouth actions (together with mine on the high of the article), I believe this comes off much more lifelike than the Video Avatars.
I’m nonetheless ready for extra choices similar to expression and power cues, however I’m positive that’s solely a matter of time.
For my subsequent Avatar IV exams, I first created 4 completely different photographs in Midjourney, of skilled businesswomen from numerous ethnicities.
Utilizing the identical AI generated script textual content, I enter it into ElevenLabs and located 4 completely different voices that I felt matched every AI generated picture of a businesswoman.
I then imported the photographs and the audio information from ElevenLabs into HeyGen’s new picture to video instrument, Avatar IV. As you possibly can inform, every performer is studying the identical script. The main points across the hair and backgrounds are what actually strike me, together with physique movement and respiratory. However as I discussed in my opening video, the mouth actions are nonetheless a bit over-exaggerated.
I discover this actually superb expertise, going from a very AI generated topic, to an AI script, to AI generated voice monitor to AI generated video from these components. All inside a number of minutes.
Picture to Video Instruments
There are a number of AI Picture to Video instruments rising and competing for a distinct segment market phase, however some are additionally shortly surpassing many AI Video to Video instruments within the outcomes. Even among the Textual content to Video instruments I’ve seen some exceptional outcomes (see the underside “automotive present” video towards the tip of this text).
On this take a look at, I used Midjourney to generate a beginning picture of a small group of individuals in opposition to a inexperienced display. Sadly, this was one of the best outcome I may get from numerous makes an attempt, nevertheless it was one thing I may work with in Photoshop to wash it up.
I took the picture into Photoshop and first did a coloration correction cross on the topics to supply some particulars to generate a matte.
Utilizing Photoshop’s Extract Object instrument, I used to be in a position to choose the background and even out the colour for a stable inexperienced that might work for extraction of the animated topics if all works as anticipated.
The ultimate cleaned picture I used as a supply for all the next examples:
A few variations on the textual content prompts tried with supply picture in every AI instrument:
“digicam slowly strikes round in an arc across the left facet of the group that’s cheering excitedly!”
“digicam slowly tracks in an arc across the left facet of the group cheering excitedly!”
As you possibly can see in just a few screenshots right here, what the immediate was and the way it dealt with it.
Runway Gen4
OA Veo2
OA Wan2
Adobe Firefly
Utilizing the identical easy textual content prompts and the identical supply picture file, I examined a number of of the highest “Picture to Video” instruments and acquired lower than fascinating outcomes.
A few of these outcomes are simply hilarious! Individuals simply standing there, loopy coloration adjustments, further folks present up off digicam and the spinning… OH MY GOD THE SPINNING!! 😛
Most instruments had been simply irritating although. Utilizing easy prompts for digicam strikes simply doesn’t appear to work but. Most pictures did some sort of zoom or truck/pan off whereas different simply had folks leaping or flapping their wings. Regardless of how I tweaked the immediate, it often simply acquired worse. This is able to have been a very irritating course of, had I wanted it to work for a venture, however after all, I’d be making an attempt to make use of extra subtle prompting in such a case. This was actually only a easy comparability take a look at between the out there instruments.
Take a look at examples from Sora, Kling, Krea, Runway Gen4, Hailuo, OpenArt, Vidu Q1, VEO 3, Vidfly and Adobe Firefly.
Be aware that what I used to be in search of was the digicam transfer – to trace an arc across the small group of individuals slowly. I wasn’t as focused on what the folks had been doing, though the outcomes are humorous in any case. The absurdity of among the video clips… there was one clear winner on this take a look at, despite the fact that it tracked the improper course – I’ll take it.
So this simply results in considered one of my subsequent articles within the sequence: Prompting Strategies. We’ll be studying collectively simply what works and doesn’t between the instruments.
Talking of prompting, this superb AI video created totally from textual content prompting (video and audio clips) in Veo3 by artist László Gaál of a non-existent automotive present that has been making waves within the business this week. It exhibits simply how shut we’re getting to finish Textual content to Video productions.
Description from YouTube submit: “Earlier than you ask: sure, every thing is AI right here. The video and sound each coming from a single textual content immediate per clip utilizing #Veo3 by Google DeepMind after which these clips are edited collectively. Whoever is cooking the mannequin, let him prepare dinner! Congrats to Matthieu Lorrain and the workforce for the Google I/O dwell stream and the brand new Veo website!”
I’ll be exploring extra with immediate engineering and digging into Textual content to Video instruments and workflows in my future articles. They actually go hand in hand for one of the best outcomes.
As we discover these instruments additional, I’ll proceed to spotlight among the finest productions that I discover and am at all times welcoming your enter and perception on the expertise and the impression on the movie & video business.
Keep tuned…
Leave a Reply