🦈 Shark Tank — but the sharks are multimodal AI.
At last weekend's SCU Hack-A-Stack Sprint, Rishit Shiramshetti, Abraham Bhatti & Edrick Chang built Pitch Tank — and walked away with our "Best Use of Tencent Cloud" track. 🏆
Pitch Tank drops founders into a live session with AI-powered VC judges that can see you, hear you, and interrupt you in real time. A mood processor reads webcam frames for confidence signals — posture, eye contact, body language — and feeds them straight into the agent's context. Nervous? You'll get tougher follow-ups. Confident? Expect to be cut off with a sharper question. 🎯
Under the hood: a FastAPI backend managing websocket communication,
@TencentRTC handling real-time audio/video transport, and
@visionagents_ai driving the intelligence layer.
📖 We dug into how the team synchronized two independent pipelines, why they chose prompt-level persona switching over multi-model orchestration, and what this integration unlocks for developers — full breakdown in the article
Pitch Tank also showcases our partnership with Stream — Tencent RTC is now an officially supported edge transport plugin for Vision Agents, their open-source framework for real-time multimodal AI agents. By plugging Tencent RTC into Vision Agents as an edge transport layer, developers get access to our global backbone (3,200 nodes, sub-300ms latency worldwide) without changing a single line of their agent logic. Every LLM, STT, TTS, and vision plugin in the Vision Agents ecosystem works the same way.
Live coaching, gaming copilots, video-call avatars, training sims, drone control — if it needs to see, hear, and respond in real time, this stack is for you.
👇 Full story, architecture diagram, and code walkthrough in the article below.