RL & self-evolving agents at ServiceNow AI Research

Joined July 2007
37 Photos and videos
Pinned Tweet
Really cool to see PipelineRL's in-flight weight updates being picked up! We're spreading it across our research teams to train models to reason and to make reasoning more efficient.
2
4
458
Rafael Pardinas retweeted
Excited to share my recent work @ServiceNowRSRCH ! We introduce a new privacy-centric deep research dataset and show models frequently leak enterprise information. However, training with dense _situational_ rewards efficiently learns to jointly optimize performance and privacy
MosaicLeaks is now on arXiv. The Mosaic Effect captures a simple idea: small fragments can look harmless alone, but become revealing in aggregate. Deep research agents can leak enterprise information in exactly this way. 1/9
3
7
532
MosaicLeaks is now on arXiv. The Mosaic Effect captures a simple idea: small fragments can look harmless alone, but become revealing in aggregate. Deep research agents can leak enterprise information in exactly this way. 1/9
1
2
6
1,201
The core idea: Enterprise agent privacy failures will not only come from copying private text. They can also come from the external actions agents take while trying to be useful. Privacy shouldn't come at the cost of utility, we can optimise for both. 8/9
1
28
this is too good
長椅子振動主への反射システム
68
This is becoming really powerful. More to come for high latency agentic pipelines
Better reasoning does not have to mean longer reasoning. Apriel OpenReasoner: fully reproducible multi-domain RL post-training using public datasets. 30-50% shorter traces, no quality trade-off. @ServiceNowRSRCH @ehsk0 @dvazquezcv @alexandredrouin
1
55
London tech you say?
I spent the last few weeks crowdsourcing the ultimate guide to London’s startup ecosystem. Here's why. Finding your people is a lifelong mission- the people that push you, open doors for you, celebrate your wins, advise you sincerely and say yes to your crazy ideas. It’s one of the reasons people love San Francisco. Everyone is rooting for you and believes in you. There is a sense of wild ambition. But is this something only unique to SF? What is/was London missing? I think it really came down to a few things: - Optimism - A mindset of waiting for permission - Lack of a catalyst Those in the startup world would have felt a shift over the past couple of months that has instilled a renewed sense of optimism for Britain, a mentality of not waiting for anyone’s permission and the catalyst of the AI boom empowering a new generation of builders. And surprisingly, this isn’t new for Britain. We made the jet engine, steam trains, discovered the structure of DNA, discovered gravity and so much more. There was no concept of permission. The UK that exists today has: - Anthropic, OpenAI and DeepMind all opening offices in Kings Cross - Startups raising absurd rounds building generational companies (just 2 days ago Fractile raised a $220m Series B) - Unmatched talent being pulled in from Oxford, Cambridge, Imperial, UCL, Warwick, Kings and even European universities like ETH So how can someone get involved and how can we level the playing field for those outside the startup ecosystem? The guide friends and I created below is our small role in helping democratise some of the obscure information on the inner workings of London’s startup scene. Read it, add to it, check it regularly and most importantly, do something with it. I hope this guide helps people for years to come. Can’t wait to see what we do on top of all the infrastructure built by those before us. We’re truly standing on the shoulders of giants. 🔥 Link in comments.
1
108
Rafael Pardinas retweeted
PipelineRL finally supports vLLM v1!
Our first vLLM V0→V1 run on PipelineRL looked broken. @ehsk0 and I almost reached for an objective-side correction. That would have been the wrong fix. The real problem: four mismatches in the rollout backend. 🧵
1
5
697
Rafael Pardinas retweeted
Our first vLLM V0→V1 run on PipelineRL looked broken. @ehsk0 and I almost reached for an objective-side correction. That would have been the wrong fix. The real problem: four mismatches in the rollout backend. 🧵
1
4
7
2,489
Our first vLLM V0→V1 run on PipelineRL looked broken. @ehsk0 and I almost reached for an objective-side correction. That would have been the wrong fix. The real problem: four mismatches in the rollout backend. 🧵
1
4
7
2,489
With those fixed, V1 converged to the V0 trajectory. No objective change. Backend correctness before objective corrections — otherwise your objective fix silently compensates for a broken inference path, and the curves stop telling you anything.
1
2
76
It’s been over a year since we released this work. Since then, PipelineRL has gone places. huggingface.co/blog/ServiceN…
6
74