Baseten

Baseten

655 Photos and videos

Tweets

Pinned Tweet

Baseten

@baseten

May 13

Intelligence should be defined by the people closest to the work. Intelligence should be owned by all of us. Let’s build a many model future!

Tuhin Srivastava

@tuhinone

May 13

x.com/i/article/205459751009…

12,495

Baseten

Baseten

@baseten

Jun 12

The new AgentPerf benchmark by @ArtificialAnlys shows that @NVIDIAAI Blackwell delivers best performance for demanding agentic workloads. With NVIDIA, we're continuously investing in making your coding agents run fast, scale seamlessly, and cost less. blogs.nvidia.com/blog/nvidia…

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

New AgentPerf results from Artificial Analysis show how accelerated computing systems handle real-world agentic workloads, with NVIDIA GB300 NVL72 running up to 20x more agents per megawatt than...

blogs.nvidia.com

704

Baseten

Baseten

@baseten

Jun 12

We're thrilled to be working with the Harvey team to push open models to frontier-level performance for legal AI. Shout out to @gabepereyra for the great article. LAB was key to our joint work post-training open-weight models for legal agents.

Gabe Pereyra

@gabepereyra

Jun 12

x.com/i/article/206143772165…

1,914

Baseten

Baseten

@baseten

Jun 12

Congrats to the MiniMax team on the open-source launch of M3! There are very few <500bn parameter models that can tackle coding, agentic workloads, and multimodal all with a 1M-token context window but M3 does it all. Dig in here: baseten.co/library/minimax-m…

20,225

Baseten

Baseten

@baseten

Jun 12

Join Baseten, Lovable, and ElevenLabs to hack on the future of healthcare.

Alex Ker 🔭

@thealexker

Jun 12

Most AI demos built for healthcare don't survive in real clinical or operational environments. The data is messy, the workflows are fragmented, margin for error is near zero. That's why I'm stoked to host a 1.5-day Healthcare x AI Hackathon with @HealthcareAIGuy in NYC: a small group of engineers, founders, PMs, and builders who are serious about applying AI agents and tools to healthcare problems in production. Stack: @Baseten, @Lovable, @ElevenLabs Cash prizes. And a few surprises. 📍 NYC | June 26–27 Space is limited and application-based. Apply by June 17th at 11:59 pm ET → link below

670

Baseten

Baseten

@baseten

Jun 12

We've heard from customers that they ship model updates >50% more often with rolling deploys than their previous solutions. No downtime, parallel GPU fleet, or off-hours babysitting. Rolling deploys are autoscaling-aware, and you can pause, inspect, or roll back at any step.

Sid Shanker

@sidpshanker

Jun 12

x.com/i/article/206546281141…

3,208

Baseten

Baseten

@baseten

Jun 11

Great to see @Baseten’s own @oneill_c and @part_harry_ sitting down with @cursor_ai’s @sjwhitmore to talk about the many things their 128(!) agents are doing (and occasionally arguing about), compaction, and the future.

Sam Whitmore

@sjwhitmore

Jun 11

We're trying a new experiment at @cursor_ai - interviewing devs we admire. I chatted with @oneill_c & @part_harry_ from @baseten about how they use coding agents. We discussed their current dev workflows & some predictions for the future. Check it out below!

41:57

2,456

Baseten

Baseten

@baseten

Jun 11

We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000 tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.

Baseten

@baseten

Jun 11

x.com/i/article/206508590334…

22,763

Baseten

Baseten

@baseten

Jun 11

x.com/i/article/206508590334…

31,032

Baseten

Baseten

@baseten

Jun 10

The longer the context, the more memory your LLM needs. We introduce research techniques to compress that memory 200x on the fly without changing the base model.

Charlie O'Neill

@oneill_c

Jun 10

1/ You can shrink a language model's KV cache by 200×, in a single forward pass, and it still answers correctly. At 256k context that's 36 GiB of cache down to ~360 MiB, with no change to the base model. Here's how we did it 👇

3,486

Baseten

Baseten

@baseten

Jun 9

Baseten is live on the Respan Gateway. Congratulations to the @RespanAI team on their Gateway launch as they bring observability, evals, and routing to agents. Try Baseten Model APIs now on Respan.

1,090

Baseten

Baseten

@baseten

Jun 9

respan.ai/ai-gateway

AI Gateway for Production LLM Routing | Respan

OpenAI-compatible gateway with failover, response caching, per-key limits, and production tracing on one platform.

respan.ai

265

Sarah Sachs

Baseten retweeted

Sarah Sachs

@sarahmsachs

Jun 8

Model selection isn't just a fancy term for "looking at benchmarks". If you're just auto-updating and going off twitter vibes, you're not really adding any value to your business or your customers. To do this well, it means you need to deeply understand your use cases, how much value your customers ascribe to a problem, how much margin you want to make on that product, and how much time you want to invest into growing that margin. Came here me rant more on June 25 luma.com/65l844l9?utm_campai…

How to choose an AI model with Gamma and Notion · Luma

How to choose an AI Model | A panel hosted by Baseten | June 25 Picking the right model is one of the most critical decisions an AI product team makes right…

luma.com

Charlie O'Neill

@oneill_c

Jun 8

Working in the Training team at Baseten, I often see companies agonize over which model to use. So many people worry about how to keep up with benchmarks and new releases But with post-training and specialization, and as we see a rising tide in the intelligence of many open-source models, what really matters is your learning signal. Do you have the right user metrics to say whether a model is doing poorly or well at your task, and to use that to learn and hillclimb the task? If you want to learn more, I’m moderating a panel on June 25th in SF at 6 PM with Gamma co-founder Jon Noronha (@thatsjonsense) and Notion AI lead Sarah Sachs (@sarahmsachs) on model selection in a multi-model landscape.

2,381

Baseten

Baseten

@baseten

Jun 8

Join Charlie for a conversation with @thatsjonsense and @sarahmsachs on how @GammaApp and @NotionHQ think about model selection on June 25th.

Charlie O'Neill

@oneill_c

Jun 8

2,573

Baseten

Baseten

@baseten

Jun 8

How to choose an AI model with Gamma and Notion · Luma

How to choose an AI Model | A panel hosted by Baseten | June 25 Picking the right model is one of the most critical decisions an AI product team makes right…

luma.com

1,311

Baseten

Baseten

@baseten

Jun 5

GLM 5.1 now achieves 160 TPS and <2-second TTFT on Baseten. Ideal for agentic workloads that need high throughput and low latency.

6,586

Baseten

Baseten

@baseten

Jun 5

Check it out here: baseten.co/library/glm-51/

GLM 5.1 | Model library

GLM-5.1 is Z.AI's next-generation model for agentic engineering, with significantly stronger coding capabilities than its predecessor.

baseten.co

768

Baseten

Baseten

@baseten

Jun 4

Are you tired of waiting 17 minutes for an AI agent to finish a code change? As an agent’s context grows, standard transformer attention can turn long runs into a bottleneck. @NVIDIAAI Nemotron 3 Ultra addresses this with a hybrid architecture that replaces several attention-heavy layers with Mamba layers. This makes long-context inference far more efficient. In benchmarked settings, this means: → step 300 runs as fast as step 3 → up to 5x higher throughput → up to 30% lower cost Today, Nemotron 3 Ultra, Nemotron 3.5 ASR, and Nemotron 3.5 Content Safety are available on Baseten for production AI teams.

NVIDIA

@nvidia

Jun 4

Introducing NVIDIA Nemotron 3 Ultra. A frontier smart open model built for long-running agents that need to plan, reason, use tools and keep working across complex coding, research and enterprise workflows. Up to 5x faster inference and up to 30% lower cost for agentic tasks. Learn more: nvda.ws/4x9nGps

0:25

1,429

Baseten

Baseten

@baseten

Jun 4

Read our full write-up: baseten.co/blog/nvidia-nemot…

Introducing NVIDIA Nemotron 3 Ultra: The Nemotron 3.x family is here!

Are you tired of waiting 17 minutes for a code change?

baseten.co

423

Tuhin Srivastava

Baseten retweeted

Tuhin Srivastava

@tuhinone

Jun 2

Today we're announcing MAI-Thinking-1 with Microsoft and it will be available on Baseten soon. Microsoft built something genuinely different here: a commercial-grade thinking model trained on clean data with no distillation from third-party models and designed to be fine-tuned by the enterprises using it. Microsoft AI guarantees 100% eyes-off on post-training data and Baseten will handle the fine-tuning and deployment at scale. The future isn't one model. It's many models, each owned by the businesses that shaped them and MAI-Thinking-1 is a big step in that direction. baseten.co/blog/mai-thinking…

330

37,933

Dannie Herzberg

Baseten retweeted

Dannie Herzberg

@DannieHerz

Jun 4

I’m thrilled to welcome Gabe Stern to Baseten to lead Legal. Gabe is the whole package: deeply experienced, sharp, highly trusted, and commercially minded. We first got to work together at Slack, where he was an exceptional partner and played a critical role through Slack's hyper-growth & IPO. I’m personally very happy to be reunited with Gabe, and even happier that Baseten gets to benefit from his judgment, partnership, and instincts. Welcome, Gabe!

Baseten

@baseten

Jun 4

We are excited to welcome Gabe Stern as General Counsel. Welcome, Gabe!

3,588