Ramp Labs

Ramp Labs

44 Photos and videos

Tweets

Pinned Tweet

Ramp Labs

@RampLabs

Jun 12

Today we’re releasing Ramp SWE-Bench: a private, production-grounded coding benchmark created from real engineering problems we've faced at Ramp.

0:27

910

161,831

Ramp Labs

Ramp Labs

@RampLabs

Jun 12

Today we’re releasing Ramp SWE-Bench: a private, production-grounded coding benchmark created from real engineering problems we've faced at Ramp.

0:27

910

161,831

Ramp Labs

Ramp Labs

@RampLabs

Jun 12

Public benchmarks saturate quickly and inevitably leak into training data, with none quite resembling the work our engineers do every day. Building our own benchmark has allowed us to evaluate models within our own financial software ecosystem. We compared models side by side and unearthed their behavioral differences. Head to head breakdowns available here: labs.ramp.com/swebench

9,246

Ramp Labs

Ramp Labs

@RampLabs

Jun 12

When measuring effectiveness versus cost, the frontier presents as a tradeoff rather than a single winner. Read our methodology and explore the results below: labs.ramp.com/swebench

105

54,665

Ramp Labs

Ramp Labs

@RampLabs

May 27

We deployed 10,000 background agents to security-scan our codebase. The system is simple, scales with compute, and runs on publicly available models. From the scan, we fixed several high-severity vulnerabilities.

0:28

459

70,671

Ramp Labs

Ramp Labs

@RampLabs

May 27

The scan pipeline is model-agnostic, and does not require a frontier model to drive it. We evaluated several models against our confirmed vulnerabilities, and found that cheaper open-weight models still surface high-severity issues.

4,589

Ramp Labs

Ramp Labs

@RampLabs

May 27

We built this to earn trust from Ramp customers, who rely on us for their cards, expenses, and payments. If you have a background coding agent, you can build a similar scan for your customers. Full article: x.com/RampLabs/status/205967…

Ramp Labs

@RampLabs

May 27

x.com/i/article/205966652072…

5,178

Ramp Labs

Ramp Labs

@RampLabs

May 27

x.com/i/article/205966652072…

211

476,474

Ramp Labs

Ramp Labs

@RampLabs

May 7

We partnered with @PrimeIntellect to build Fast Ask, a small RL-trained subagent that helps our Sheets agent find answers in spreadsheets. It scores 4% over Opus on exact match accuracy at Haiku latency.

0:17

741

328,497

Ramp Labs

Ramp Labs

@RampLabs

May 7

This was a good fit for RL because spreadsheet retrieval is repeated often, latency sensitive, and has clean feedback. The model either returns the right cent amount, date, invoice ID, yes/no, or row reference, or it does not. That let us optimize the retrieval policy directly with deterministic rewards.

9,196

Ramp Labs

Ramp Labs

@RampLabs

May 7

We built a synthetic RL environment with 14 finance task types, gave the model 3 tools and 15 turns, and let it learn how to navigate workbooks on its own. Information retrieval was a huge bottleneck for our spreadsheet agent, fast ask helped solve this. Full writeup: x.com/RampLabs/status/205244…

Ramp Labs

@RampLabs

May 7

x.com/i/article/205242296501…

6,561

Ramp Labs

Ramp Labs

@RampLabs

May 7

x.com/i/article/205242296501…

603

384,237

Ramp Labs

Ramp Labs

@RampLabs

Apr 21

At Ramp, we've seen AI token spend skyrocket 13x among our customers since last January. We ran experiments where coding agents managed their own token budgets. They ignored them completely, so we employed a separate controller model to approve spend on their behalf.

0:35

184

36,767

more replies

Ramp Labs

Ramp Labs

@RampLabs

Apr 21

Controllers consistently followed unverified advice over the coding agent’s work right in front of them. Even with a warning that the advice might be wrong, accuracy was well below a coin flip for most models. Only one condition produced accurate decisions across the board: grounding the controller with hard numbers.

3,802

Ramp Labs

Ramp Labs

@RampLabs

Apr 21

AI token spend is climbing fast as companies put agents into real workflows. Don’t let agents decide how much they should spend. Track, forecast, and control AI spend by team, model, and project → ramp.com/ai-cost-monitoring

AI Token Spend Management | Track Token Usage & Spend by Team | Ramp

Track AI spend across Anthropic and OpenAI in one dashboard. See usage by team, model, and project. Forecast costs before they spike. Free for Ramp customers.

ramp.com

2,677

Ramp Labs

Ramp Labs

@RampLabs

Apr 21

x.com/i/article/204425659178…

161

139,126

Ramp Labs

Ramp Labs

@RampLabs

Apr 10

Introducing Latent Briefing, a way for agents to quickly share their relevant memory directly. Result: 31% fewer tokens used, same accuracy. Multi-agent systems are powerful, but can be wildly inefficient. They pass context as tokens, so costs explode and signal gets lost. We built an algorithm that allows agents to communicate KV cache to KV cache.

0:31

1,771

669,751

more replies

Ramp Labs

Ramp Labs

@RampLabs

Apr 10

We ran RLM on LongBench v2 across various document lengths and difficulty levels, observing a 30% median token reduction with a consistent 3% accuracy boost. We also found that the optimal compaction level is dynamic: Longer documents benefit from lighter compaction, while harder tasks require more aggressive filtering.

15,150

Ramp Labs

Ramp Labs

@RampLabs

Apr 10

Conceptually, this is a bit like taking notes. Sometimes you’re trying to build a body of knowledge over time, and the details matter because they accumulate into something larger. In those cases, you want to preserve context rather than compress it too early. With harder problems you’re often sketching ideas, exploring directions, following threads that may or may not lead anywhere. Most of what gets written down in that process isn’t meant to last. Latent briefing = saving time and money 😎 Full write up: x.com/RampLabs/status/204266…

Ramp Labs

@RampLabs

Apr 10

x.com/i/article/204263155026…

12,962