For the past couple of weeks, I’ve been working on a Proof of Concept (PoC) and personal research project to assess the current boundaries of open source LLMs for coding and advanced product building.
My focus? Build a design web app and editor for text2presentations.
Recently there was a lot of buzz with the launch of Claude Design in this space.
Other industry giants like Adobe, Canva, Google together with dozens of other start-ups have highly optimized generative AI slide deck builders. However, the tradeoff usually falls between slow, high token cost premium models or fast, heavily templated alternatives that lack true design finesse.
So my initial product brief was simple: build an application that generates great looking, creative diverse presentations in under 2 minutes, using a fraction of the token budget of frontier models on open source LLM’s infrastructure. Also, use all available open source and frontier models on the project to assess their cost and viability.
The results from the initial PoC exceeded my expectations.
By leveraging a custom orchestration layer that is the "secret sauce" constraint engine (“the harness”) that I designed around the models, I was able to get the app running on just 50k tokens using serverless cloud hosted models like MiniMax 2.7, generating high-quality, fully editable presentations in under 120 seconds.
To test the true limits of vibe coding and local infrastructure, I ran a local setup on an RTX 4090 - 24GB VRAM using llama.cpp and opencode web UI, handling the bulk of the early architecture with Qwen 3.6 27B running at 40 tk/s with a 200K context window made possible by Turboquant.
The beauty of this architecture is that the entire generation pipeline can run locally with 100% privacy on high-end GPUs.
I treated this as a rigorous multi-model research experiment on development efficiency (cost vs performance):
Orchestration & Code Base: Primarily built utilizing open source models like DeepSeek V4 Pro/Flash, GLM, Qwen and MiniMax via Ollama.
Frontier Model Comparison: Used Claude Code with Opus 4.7 and Gemini 3.1 Pro for roughly 10-15% of the codebase to see how these models compare with the open source ones.
The whole journey is a massive learning experience and it feels incredibly fun building things in a way that was never possible before.
Once the core generation engine was stable, I put my Product Manager hat on to move toward a launch ready MVP.
Over the past week, I focused heavily on system resilience, security, and user data flows.
Brand Guidelines via Vision
Users can upload reference images or documents to establish custom branding guidelines and automated asset summaries.
WYSIWYG & Canvas State
Workspace management system where past decks can be stored, shared via public links, exported to disk, or tweaked directly in a live editor.
System Hardening & Security Audit
Built comprehensive guardrails against attack vectors inherent to prompt-driven apps, including aggressive prompt injection and various code injections.
Scalable Cloud Architecture
Architected a 3-Tier setup, completely decoupling the presentation canvas, the orchestration logic, and the data layer. To ensure high availability and keep database transactions lightning-fast, I decoupled the primary metadata database from an S3 object storage bucket (used for storing heavy slide assets). This is paired with a serverless routing model for LLM calls that completely discards prompt inputs to preserve absolute user privacy.
The most important takeaway from this sprint? In just a few weeks I was able to cover a lot of ground in terms of app building, as my main focus was to see how far I can go in developing a full fledged working app using mainly open sourced models.
We are entering a world where I see local LLMs rising to compete head to head with frontier models, as this project revealed.
They are surely at least 95% percent of the way there.