Are you still manually stitching together LLM scripts, generated images, and TTS audio just to build a basic AI video pipeline?
The story-flicks repository unifies these fragmented multi-modal tasks into a single automated Python pipeline. It uses an LLM to generate structured scene scripts, parallelizes requests to diffusion and text-to-speech models, and relies on MoviePy to sync audio timestamps with subtitles and stitch image frames.
✅ Vendor-agnostic model inference with native support for OpenAI, DeepSeek, and local LLMs via Ollama
✅ Parallelized generation of image assets and synthesized audio tracks per narrative scene
✅ Automated subtitle synchronization using explicit audio timestamps during the assembly phase
✅ Modularity designed around the adapter pattern to swap underlying AI providers seamlessly
This architecture serves as a production-ready scaffolding for asynchronous programmatic video generation, though developers should note that MoviePy relies on CPU-bound rendering which can bottleneck heavy high-definition media processing.
This open-source engine has already earned over 2,345 stars on GitHub, demonstrating why it is a compelling modern alternative for developers looking to completely bypass the paywalls and token limits of closed SaaS solutions.
REPOOO 👇