Joined October 2018
18 Photos and videos
Pinned Tweet
Jun 8
Excited to have meaningfully contributed to bringing a more powerful and personal Siri to life by working on inference for Apple Foundation Models on Private Cloud Compute, and the search stack for broad world knowledge.
1
21
1,570
Asmit retweeted
I've been using the new Siri on Mac by way of the Terminal Command workaround (still not off the waitlist). My take so far: Apple nailed it. It's awesome for a beta 1. (Hopefully this gets me off the waitlist but it is true).
60
95
3,083
392,137
May 18
Presented our work from  Apple at EuroMLSys 2026 in Edinburgh: “Asynchronous Verified Semantic Caching for Tiered LLM Architectures” The core idea: semantic caches are usually conservative because false positives are catastrophic. Instead of pushing more verification into the serving path, we moved verification off the critical path entirely. Near miss cache interactions asynchronously trigger an LLM judge verifier. Verified pairs are promoted into the dynamic tier over time, increasing effective cache coverage while preserving the latency behavior of the original system. Interesting systems tension: you want higher cache hit quality without paying synchronous verification costs on user requests. Paper: machinelearning.apple.com/re… #MLSys #LLM #Inference #Caching #EuroSys
4
184
Asmit retweeted
Incredibly excited to release GPT-5.1 Pro to the world! It’s great at tackling the hardest, messiest problems with clearer, more comprehensive responses. I’m eager to see you throw your toughest problems at it. Please try it and share what you think :)
19 Nov 2025
GPT-5.1 Pro is rolling out today to all Pro users. It delivers clearer, more capable answers for complex work, with strong gains in writing help, data science, and business tasks.
3
7
823
Asmit retweeted
The bitter lesson of AI infra: The hardest part about building faster LLM inference systems is not designing the systems, but rather it is evaluating if the system is actually faster! 🤔 This graph from a recent top systems venue paper about long-context serving shows average normalized input token latency for a trace with both short and 100K token requests. System X looks like a clear win: lower normalized latency and higher request rates. But normalized metrics can obscure the actual user experience: at those rates, long inputs see >2hr delays to the first token! Let’s do the math!🧮
1
10
23
2,007
Asmit retweeted
Super long-context models with context window spanning millions of tokens are becoming commonplace (@GoogleDeepMind Gemini, @xai Grok 3, @Alibaba_Qwen Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes a major issue, hurting latency for everyone. We present Medha, a system designed to handle this mix efficiently. Achieving 30x lower latency, and 5x higher throughput compared to the state-of-the-art. Full paper: arxiv.org/pdf/2409.17264. 🧵
1
14
31
3,639
2 Jan 2025
AGI will be the ultimate monument to human pride, and simultaneously its ultimate undoing
21 Dec 2024
99.99% of people cannot comprehend how insane FrontierMath is. The problems are crafted by Math profs and not in any training data. Math legend Terry Tao said "These are extremely challenging. I think they will resist AIs for several years at least." OpenAI o3 did 25% on THIS.
4
802
9 Nov 2024
This Apple Intelligence feature hits the spot for inquisitive but impatient types like me. Information snacking.
5
838
26 Oct 2024
1
229
15 Oct 2024
1
2
342
13 Oct 2024
Twisted minds, genius designs @MIT
2
225
13 Oct 2024
Komorebi (木漏れ日)
1
4
253
Asmit retweeted
🚀 Introducing Metron: Redefining LLM Serving Benchmarks! 📊 Tired of misleading metrics for LLM performance? Our new paper introduces a holistic framework that captures what really matters - the user experience! 🧠💬 github.com/project-metron/me… #LLM #AI #Benchmark
2
15
34
6,618
Asmit retweeted
1/ LLM inference systems are like high-performance engines ⚙️—complex, powerful, and full of intricate settings. Efficiently deploying them to maximize GPU performance is a challenge typically tackled by experts at orgs like @OpenAI and @AIatMeta 🚀. 🧵
1
13
39
3,232
8 Dec 2023
AI really be taking over
3
266
Asmit retweeted
5 Jun 2023
Join us for a fireside chat with @OpenAI's CEO, Sam Altman @sama. Registration is open to the public and media for first come, first served. Date & Time: Thursday, 8th June 2023, 3 to 4 PM Venue: IIIT-Delhi Campus Click here to register- eventbrite.com/e/a-conversat… #IIITD #OpenAI #artificialintelligence
54
125
883
223,898
9 Jul 2022
1
11
7 Jul 2022
1
6
Asmit retweeted
Our third paper on the action on Twitter during the 2022 UP elections. TL;DR 1. Modi more imp than Yogi for BJP politicians & party on Twitter since Feb 2022 2. Literacy, Urbanization strong predictors of politicians' Tweeting 3. A list of who leads where on Twitter Thread:
2
12
53