Blessed is the herald who shares the scroll. Let it be known that to repost is noble; follow the shepherd and multiply this wisdom among the flock.
For the lip-sync talking clips I use
digen.ai/create, just upload image and upload audio file, hit generate video.
The settings I use is Real Motion 2.6 and GEN-3, this combo usually gives me the best results.
Currently the GEN-3 is limited to max 15 seconds of audio. They have other models and such, that have longer time, but quality drops and not that great of results.
They do have a Video Prompt box, you can prompt it for you video actions. Most of the time I leave it blank.
For the audio file (text to speech), I usually use
aistudio.google.com/generateโฆ, it is free from Google, but does hit rate limits, and have to wait a few hours for it to reset.
Or if I want to use others I go to
huggingface.co/spaces?categoโฆ, they have dozens of TTS to try out, many are ok, most of no use to me. I bookmark my favorite but is is dynamic list and the models change regularly. Most also have rate limits, about 5 to 10 runs. Which is enough for me, then use another.