Aakash Gupta

Aakash Gupta

13 Photos and videos

Tweets

Michael Pappas retweeted

Aakash Gupta

@aakashgupta

Jun 3

Most voice pipelines still look like this: audio → transcribe → text → model → act The problem is step one. The second you turn audio into text, you throw away tone, hesitation, sarcasm, stress. The signal that told you what the person actually meant. So everything you built after that ran on the transcript. The actual conversation was already gone. @modulate_ai trained Velma on 550M hours of raw audio to skip that step entirely. One model that listens to the audio instead of reading a summary of it. #1 on the conversation understanding benchmark. 10x cheaper than running it through an LLM. The part that makes this more interesting: Velma has already been running in production inside Call of Duty, GTA Online, and Fortune 500 contact centers. Now the API is open to everyone.

Modulate

@modulate_ai

Jun 3

The world's first audio-native AI model is now available as an API. Velma listens and understands like a human — emotions, tone, intent, rhythm, vocal stress. Already analyzed 550M hours of conversation for Fortune 500s. Now open to developers. 🧵

2:08

12,270

Michael Pappas

Michael Pappas

@mpappas74

Jun 2

Hot take: most voice AI isn't actually understanding speech. It's reading a transcript of it. There's a meaningful difference. And it's why voice pipelines keep failing at the moments that matter most. Dropping something tomorrow that takes a very different approach 👀

Michael Pappas

Michael Pappas

@mpappas74

Jun 2

You've spent hours tuning your voice pipeline. Better STT model. Cleaner NLP. More labels. And it still misses the call where a customer was clearly about to churn. The problem isn't your implementation. It's the architecture. Something different is coming @modulate_ai 👀

112

Modulate

Michael Pappas retweeted

Modulate

@modulate_ai

May 26

95% of enterprise AI deployments fail. That’s not a tooling problem. It’s a design problem. In this clip, our CEO @mpappas74 breaks down why most AI products never make it past the demo stage - and what businesses actually need instead. Not another “AI employee” to manage. Tools that fit into real workflows and solve specific problems. At @modulate_ai, that’s the lens we build through.

0:54

152

Michael Pappas

Michael Pappas

@mpappas74

May 26

x.com/i/article/205918467427…

Michael Pappas

Michael Pappas

@mpappas74

May 25

do you remember when you joined X? i do! #MyXAnniversary

There's An AI For That

Michael Pappas retweeted

There's An AI For That @theresanaiforit

May 11

There’s an AI for transcription. 🦾 taaft.co/modulate if you’re building with voice, details matter: Grok STT: $0.10/hr → transcripts only @Modulate’s Velma: $0.03/hr → 14.9% WER → emotion detection → accent detection → PII redaction Test it yourself...👇

8,690

Michael Pappas

Michael Pappas

@mpappas74

May 20

Voice fraud isn’t just a security problem. It’s a massive, ongoing cost center. In this video, I break down what it’s *actually* costing businesses today - and it’s more than most people realize 👀 There are two layers to it: 1. Direct losses $$$ When voice fraud hits, the damage can be immediate - and in some cases, reach hundreds of millions. Often unrecoverable. 2. The cost of trying to prevent it $$$$ Even if you’re never breached, you’re still paying: - Added authentication friction that slows down users - Frustrated customers - and lost revenue - Teams tied up auditing calls, running investigations, and handling compliance All of that adds up to tens of millions in ongoing operational cost. So the real question isn’t “what happens if we get hit?” It’s “how much are we already spending because this risk exists?”

1:01

Michael Pappas

Michael Pappas

@mpappas74

May 20

If you’re dealing with this today, test how we're approaching a better way to solve this problem: modulate.ai/deepfake-detect?…

Michael Pappas

Michael Pappas

@mpappas74

May 14

“Can’t my LLM provider just solve this too?” I hear this all the time - and I think it’s the wrong question. Because in practice, the more general a system tries to be, the harder it is to trust for any specific task. We’ve seen this before with software. Specialization wins when reliability matters 🧵

0:20

Michael Pappas

Michael Pappas

@mpappas74

May 14

So here’s what I’m curious about: Are you betting on generalist AI to handle critical workflows? Or are you moving toward more specialized systems you can actually control and rely on? I share what I think in the video - and why we’ve taken a different approach at @modulate_ai

Modulate

Michael Pappas retweeted

Modulate

@modulate_ai

May 12

AI regulation is solving the wrong problem. Right now, most policies are built around generative AI: models like ChatGPT that create content (and yes, can hallucinate) But that’s only half the picture. There’s another category: analytic AI. Systems designed to understand what’s happening and return fixed, verifiable answers - no guessing, no hallucinations. In this clip, our CEO @mpappas74 breaks down why treating both the same is a mistake - and how current regulations are unintentionally slowing down tools that don’t carry the same risks. At Modulate, this distinction is core to how we build. Because not all AI should be regulated like it makes things up 👀

1:00

399

Modulate

Michael Pappas retweeted

Modulate

@modulate_ai

May 7

The AI playbook says: more data, more compute, bigger models. We don’t buy it. At Modulate, this is how we think how we build. In this clip, our CEO @mpappas74 breaks down why focused data real insight beats brute force. We’re a team of ~40, and that approach has led to: - Transcription models outperforming @OpenAI on accuracy - Deepfake detection models topping the @huggingface speech arena leaderboard Not by hoovering the internet. By using the right data. Because better > bigger.

0:50

164

Michael Pappas

Michael Pappas

@mpappas74

May 5

Excited to share this! @hackernoon

Modulate

@modulate_ai

May 5

We’re @hackernoon Company of the Week Voice AI breaks down when things get real: messy audio, emotion, overlap, intent. So we built Velma - the first Ensemble Listening Model, trained on 550M hours of real-world audio, designed to understand speech as it actually happens (not sanitized benchmarks). And ToxMod - real-time voice moderation that detects how something is said, not just the words. This isn’t research. It’s deployed at scale across Fortune 500 platforms today 🌐 Voice is the hardest problem in AI. It’s also the most human. We’re building the infrastructure to make it work -safely, accurately, in real time.

0:08

Modulate

Michael Pappas retweeted

Modulate

@modulate_ai

Apr 23

Deepfakes are getting cheaper to create. Detection shouldn’t be 100x more expensive. We just flipped the equation. Stop deepfakes for $0.25/hr - now at cost parity with creation ⚖️

480

Michael Pappas

Michael Pappas

@mpappas74

Apr 23

Everyone wants a bigger model. That’s the wrong direction. We keep getting asked how @modulate_ai compares to LLMs like ChatGPT. Honest answer: we’re solving a different problem. For real-world business use, one giant “black box” model creates more problems than it solves: – hard to tune – tough to make compliant – inconsistent when it matters most Businesses don’t need better conversation. They need precision - numbers, probabilities, outcomes they can trust. So we built differently: → 100 specialized models → each trained for a specific signal → working together to analyze conversations in a structured, reliable way Less flashy than a single massive model. Way more predictable. Way more useful. Breakdown in the video 👇

1:20

197

Michael Pappas

Michael Pappas

@mpappas74

Apr 16

this is the pattern we’re going to see everywhere: phase 1: “wow it runs itself” phase 2: “wait why is it doing that” phase 3: add constraints, memory, and oversight until it resembles… a well-instrumented system agents aren’t magic. they’re just very fast optimizers with incomplete context. the winners won’t be the ones who deploy agents first - they’ll be the ones who define the right objectives and catch failures early.

Om Patel

@om_patel5

Apr 15

SOMEONE PUT AN OPENCLAW-RUN VENDING MACHINE IN SAN FRANCISCO an AI agent is running an actual physical vending machine OpenClaw decides what to sell, how to name the products, how to price them, creates the ads, and tracks all the sales you can even see a dashboard of all the sales that the AI vending machine made the vending machine hardware does the dispensing. the AI does everything else, and of course inventory is supplied by the guy who runs it it's installed at Frontier Tower in SF which is a building packed with AI and robotics startup founders the agent forgot things, hallucinated, and at one point raised prices way too high. then tried to justify it because people were still buying we are now living in a simulation.

7:11

222

Carter Huffman

Michael Pappas retweeted

Carter Huffman

@whuffman

Apr 14

Something didn’t add up… so we ran an experiment. Deepgram claims 5.26% WER. We got 28.1% on real-world audio. Not a rounding error. A reality check. Most ASR models are tested on clean, perfect data. But real conversations are messy. So we tested where it actually matters. @modulate_ai came out ~2x more accurate. Full breakdown ↓ bit.ly/4mpyNFG

How Deepgram and Modulate Benchmark Against Real-World Audio | HackerNoon

We benchmarked Modulate and Deepgram on AMI, VoxPopuli, and Earnings-22. On the hardest dataset, Modulate had half the error rate at a fraction of the cost.

hackernoon.com

315

Michael Pappas

Michael Pappas

@mpappas74

Apr 3

x.com/i/article/204007641308…

843,911

Modulate

Michael Pappas retweeted

Modulate

@modulate_ai

Apr 2

Paying $$$$/hr to catch voice deepfakes? Cute. Meet Velma Deepfake Detect - Modulate’s synthetic voice detection API. 98.9% accuracy, 🥇 ranked top on @huggingface Speech Arena, and 120× cheaper than next best model providers. Time to stop getting ripped off ↓

13,891