Pankaj Gupta

Pankaj Gupta

13 Photos and videos

Tweets

Pinned Tweet

Pankaj Gupta @defpan

9 Jan 2025

Driving model performance optimization: 2024 highlights baseten.co/blog/driving-mode… via @baseten

Driving model performance optimization: 2024 highlights

Baseten's model performance team works to optimize customer models for latency, throughput, quality, cost, features, and developer efficiency.

baseten.co

1,813

Yikai Zhu

Pankaj Gupta retweeted

Yikai Zhu

@YikaiZhu98

May 29

x.com/i/article/206040169234…

27,174

Faraz Shahsavan

Pankaj Gupta retweeted

Faraz Shahsavan

@Faraz9877

May 18

x.com/i/article/205645541126…

5,480

Charlie O'Neill

Pankaj Gupta retweeted

Charlie O'Neill

@oneill_c

May 13

x.com/i/article/205460318733…

557

131,640

Tuhin Srivastava

Pankaj Gupta retweeted

Tuhin Srivastava

@tuhinone

May 13

x.com/i/article/205459751009…

541

253,316

Aaryam Sharma

Pankaj Gupta retweeted

Aaryam Sharma

@AaryamSharma

May 8

x.com/i/article/205281547071…

23,888

Baseten

Pankaj Gupta retweeted

Baseten

@baseten

Apr 20

Kimi K2.6 has landed, and it is live on Baseten! We have baked in multiple inference optimizations so that you can leverage Kimi K2.6 in production right away. To run Kimi K2.6, Baseten uses: -> The Baseten Inference Stack with advanced optimizations, including KV-aware routing -> NVFP4 weights to unlock maximum performance on NVIDIA Blackwell GPUs -> Multimodal hierarchical caching for low-latency vision input -> Prefill-decode disaggregation for LLM inference optimization. Try it now at: baseten.co/library/kimi-k26

143

114,307

Tuhin Srivastava

Pankaj Gupta retweeted

Tuhin Srivastava

@tuhinone

Apr 3

OpenEvidence has become the default medical knowledge platform for over 40% of U.S. physicians; it's relied on daily for the highest-stakes decisions in medicine. Baseten is honored to power the inference behind it.

3:16

159

20,607

OpenEvidence

Pankaj Gupta retweeted

OpenEvidence

@EvidenceOpen

Apr 3

Over 1 million clinical questions hit OpenEvidence every day. More than half the practicing physicians in the US rely on us at the point of care, mid-decision, with a patient in front of them. Downtime in that moment has real consequences. We partner with @baseten for our inference infrastructure to make sure answers are always there when physicians need them. They stopped by our office to talk about what that looks like under the hood.

3:16

107,198

waterloo intern

Pankaj Gupta retweeted

waterloo intern

@waterloo_intern

Mar 7

- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.

waterloo intern

@waterloo_intern

Mar 7

x.com/i/article/202980121700…

104

1,200

154,008

Pankaj Gupta

Pankaj Gupta @defpan

Mar 4

Had such a blast!

Baseten

@baseten

Mar 4

Earlier this month, we hosted our biannual company-wide offsite and gathered 180 teammates in Austin, TX. Highlights included: > talent show > a chat with @saranormous about the evolution of the inference market > fireside chat with @EvidenceOpen > hackathon > a Texas ranch experience Within the last year, Baseten has moved faster than ever before. With 4X team growth, 12X revenue growth, and 3 separate fundraises, it's hard to believe how far we've come. At that pace, alignment doesn’t just happen. Our offsites enable us to celebrate wins, strengthen relationships across teams, and align on the next few months. And we're just getting started. If this sounds exciting to you, join us! baseten.co/careers

856

Baseten

Pankaj Gupta retweeted

Baseten

@baseten

Mar 2

We painted San Francisco green and pink, and the message is clear — you need to own your inference. If you spot us around the city, share a picture with us. We’ll send you something!

3,298

NVIDIA AI Developer

Pankaj Gupta retweeted

NVIDIA AI Developer

@NVIDIAAIDev

Feb 27

Nice drop from @philipkiely and @baseten. 📗 Inference Engineering maps the stack behind modern AI inference — runtimes, infrastructure, and tooling — and digs into the practical details of serving LLMs on NVIDIA GPUs with TensorRT LLM and Dynamo. ICYMI — worth the read. 👇

Philip Kiely

@philipkiely

Feb 23

Inference Engineering launches today. baseten.com/inference-engine…

1:12

111

10,664

World Labs

Pankaj Gupta retweeted

World Labs

@theworldlabs

Feb 25

We’re building foundational world models to power the next era of 3D. From robotics to gaming, spatial intelligence unlocks entirely new worlds. Powered by inference at scale – shoutout to Baseten.

3:42

207

19,541

Jeff Huber

Pankaj Gupta retweeted

Jeff Huber

@jeffreyhuber

Feb 20

the bar has been raised for book printing thanks @philipkiely for the copy!

675

29,394

Pankaj Gupta

Pankaj Gupta @defpan

Feb 23

Inference is hard to learn because there are so many moving pieces. Now, you can see the whole stack in one place

Philip Kiely

Pankaj Gupta retweeted

Philip Kiely

@philipkiely

Feb 23

Inference Engineering launches today. baseten.com/inference-engine…

1:12

189

230

2,283

1,374,538

Baseten

Pankaj Gupta retweeted

Baseten

@baseten

Feb 19

Generational AI companies are powered by Baseten. Why? We obsess over the milliseconds, so they can ship the future. Focus on what actually differentiates you. Leave the inference to us.

1:24

4,047

waterloo intern

Pankaj Gupta retweeted

waterloo intern

@waterloo_intern

Feb 19

we quantized the best open-source diffusion model on the market 4 bits huge speedup (almost) no quality loss this is a full explanation of the trillion dollar industry's oldest trick

8,448

Baseten

Pankaj Gupta retweeted

Baseten

@baseten

Feb 10

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: baseten.co/library/kimi-k25/

15,262

Pankaj Gupta

Pankaj Gupta @defpan

Jan 27

RT @tuhinone: The biggest hurdle to widespread AI adoption isn't just model capability, it's the cost and speed of inference. At Baseten, o…