Co-founder @basetenco working on ML model performance

Joined September 2011
13 Photos and videos
Pankaj Gupta retweeted

4
8
84
27,174
Pankaj Gupta retweeted

4
4
28
5,480
Pankaj Gupta retweeted

34
70
557
131,640
Pankaj Gupta retweeted

23
92
541
253,316
Pankaj Gupta retweeted

4
7
54
23,888
Pankaj Gupta retweeted
Apr 20
Kimi K2.6 has landed, and it is live on Baseten! We have baked in multiple inference optimizations so that you can leverage Kimi K2.6 in production right away. To run Kimi K2.6, Baseten uses: -> The Baseten Inference Stack with advanced optimizations, including KV-aware routing -> NVFP4 weights to unlock maximum performance on NVIDIA Blackwell GPUs -> Multimodal hierarchical caching for low-latency vision input -> Prefill-decode disaggregation for LLM inference optimization. Try it now at: baseten.co/library/kimi-k26
12
8
143
114,307
Pankaj Gupta retweeted
OpenEvidence has become the default medical knowledge platform for over 40% of U.S. physicians; it's relied on daily for the highest-stakes decisions in medicine. Baseten is honored to power the inference behind it.
6
17
159
20,607
Pankaj Gupta retweeted
Over 1 million clinical questions hit OpenEvidence every day. More than half the practicing physicians in the US rely on us at the point of care, mid-decision, with a patient in front of them. Downtime in that moment has real consequences. We partner with @baseten for our inference infrastructure to make sure answers are always there when physicians need them. They stopped by our office to talk about what that looks like under the hood.
6
17
89
107,198
Pankaj Gupta retweeted
- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.
20
104
1,200
154,008
Had such a blast!
Earlier this month, we hosted our biannual company-wide offsite and gathered 180 teammates in Austin, TX. Highlights included: > talent show > a chat with @saranormous about the evolution of the inference market > fireside chat with @EvidenceOpen > hackathon > a Texas ranch experience Within the last year, Baseten has moved faster than ever before. With 4X team growth, 12X revenue growth, and 3 separate fundraises, it's hard to believe how far we've come. At that pace, alignment doesn’t just happen. Our offsites enable us to celebrate wins, strengthen relationships across teams, and align on the next few months. And we're just getting started. If this sounds exciting to you, join us! baseten.co/careers
1
9
856
Pankaj Gupta retweeted
We painted San Francisco green and pink, and the message is clear — you need to own your inference. If you spot us around the city, share a picture with us. We’ll send you something!
11
9
44
3,298
Pankaj Gupta retweeted
Nice drop from @philipkiely and @baseten. 📗 Inference Engineering maps the stack behind modern AI inference — runtimes, infrastructure, and tooling — and digs into the practical details of serving LLMs on NVIDIA GPUs with TensorRT LLM and Dynamo. ICYMI — worth the read. 👇
Inference Engineering launches today. baseten.com/inference-engine…
4
14
111
10,664
Pankaj Gupta retweeted
We’re building foundational world models to power the next era of 3D. From robotics to gaming, spatial intelligence unlocks entirely new worlds. Powered by inference at scale – shoutout to Baseten.

11
26
207
19,541
Pankaj Gupta retweeted
the bar has been raised for book printing thanks @philipkiely for the copy!
16
34
675
29,394
Inference is hard to learn because there are so many moving pieces. Now, you can see the whole stack in one place
1
23
Pankaj Gupta retweeted
Inference Engineering launches today. baseten.com/inference-engine…
189
230
2,283
1,374,538
Pankaj Gupta retweeted
Feb 19
Generational AI companies are powered by Baseten. Why? We obsess over the milliseconds, so they can ship the future. Focus on what actually differentiates you. Leave the inference to us.
4
7
37
4,047
Pankaj Gupta retweeted
we quantized the best open-source diffusion model on the market 4 bits huge speedup (almost) no quality loss this is a full explanation of the trillion dollar industry's oldest trick
3
6
32
8,448
Pankaj Gupta retweeted
Feb 10
Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: baseten.co/library/kimi-k25/
11
8
98
15,262
RT @tuhinone: The biggest hurdle to widespread AI adoption isn't just model capability, it's the cost and speed of inference. At Baseten, o…

1
32