Rodolfo Esparza Jr

Rodolfo Esparza Jr

12 Photos and videos

Tweets

Rodolfo Esparza Jr

@ElInsuranceGuy

23h

The next big LLM training breakthrough may not be a better reward model. It may be faster RL rollouts. New paper Bebop reports up to 1.8x end-to-end acceleration for async RL training by fixing a speculative-decoding bottleneck. #LLM #ReinforcementLearning

more replies

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

23h

Worth watching closely: this paper was tested on Qwen3.5/3.6/3.7 and includes a real caveat. If RL pushes the model far outside the entropy regime seen in SFT, the benefit can fade and online co-training may still matter.

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

23h

My take: some of the biggest AI gains will come from boring-sounding systems work that changes the economics of model improvement. Paper: arxiv.org/abs/2606.12370 What other hidden RL bottlenecks do you think break at scale? #SpeculativeDecoding

Breaking Entropy Bounds: Accelerating RL Training via MTP with...

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction...

arxiv.org

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

Jun 12

The next big AI security problem isn’t model output. It’s delegated action. A new paper argues enterprise risk has moved from data boundaries into tool-calling workflows — and proposes a runtime architecture built for production AI agents.

more replies

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

Jun 12

The implementation details matter too. The reference policy-engine core reportedly showed attenuation correctness and evidence reconstructability on every trial, with adjudication in single-digit microseconds. That makes this operational, not just philosophical.

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

Jun 12

Caveat: this governs action, not model behavior. It won’t fix hallucinations. But if agents are touching money, records, or prod systems, governance may matter more than one more model upgrade. Better models or better governance? #AIAgents #AISecurity #EnterpriseAI

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

Jun 11

The most dangerous kind of LLM sycophancy in finance might not be a user saying “you’re wrong.” It’s the model quietly absorbing a client’s prior beliefs and letting that bias the answer. That’s the big warning in this paper. #AIAgents #FinAI #LLM

more replies

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

Jun 11

My takeaway: if you’re building AI for finance, memory and personalization are not harmless product features. They can become hidden bias channels that push models toward agreement over correctness.

Rodolfo Esparza Jr

Rodolfo Esparza Jr

@ElInsuranceGuy

Jun 11

Expect a lot more discussion about “agent memory” this year. There should be just as much discussion about memory hygiene. If an LLM knows what a user tends to believe, when should it ignore that? Source: arxiv.org/abs/2604.24668

The Price of Agreement: Measuring LLM Sycophancy in Agentic...

Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general...

arxiv.org