Joined March 2017
3 Photos and videos
Juan Manuel Ciro retweeted
Excited to share that we trained rerankers at the cost/performance frontier and are open sourcing them! Contextual AI Reranker v2 šŸš€ Best performing, most efficient reranker šŸ¤— Open weights (1B, 2B, 6B) 🫔 Instruction-following (including recency-awareness) 🌐 Multilingual 1/4
6
26
168
31,244
Juan Manuel Ciro retweeted
23 Jul 2025
Context engineering has become the critical bottleneck for enterprise AI. Your AI agent works perfectly in demos but breaks down with real-world data complexity. Why? I see 6 fundamental challenges that every AI engineer faces: from the "needle in a haystack" problem where models lose critical information buried in long contexts, to the token cost explosion that makes production deployments prohibitively expensive. These are more than just technical hurdles, they're the difference between AI experiments and transformative business impact. Read my full thoughts below.
3
7
21
1,795
Juan Manuel Ciro retweeted
16 Jul 2025
Introducing: Contextual AI MCP Server (now hosted) After great feedback with our local MCP server, we have added a hosted MCP server inside the platform! This mean every RAG agent is easily accessible via MCP. Added updated info on the hosted MCP server here: github.com/ContextualAI/cont…
9
18
2,383
Juan Manuel Ciro retweeted
3 Apr 2025
Tired of guessing if your LLM responses are ā€œgood enough?ā€ With @ContextualAI’s LMUnit and CircleCI, you can run natural language unit tests directly in your CI/CD pipeline—turning subjective evals into automated, testable checkpoints. Read on: contextual.ai/blog/lmunit-ci…
5
10
1,616
The future of CI
šŸ”„ Introducing the most reliable way to evaluate LLMs and agents in production! It's time to stop ā€œvibe testingā€ your AI systems. Our latest developer's guide shows you how to rigorously test AI systems so that they hold up in production, using Contextual AI's LMUnit evaluation model and @CircleCI’s CI/CD pipeline. You’ll learn how to: • Write natural language unit tests that anyone on your team can understand • Leverage LMUnit – Contextual AI's state-of-the-art, specialized evaluation language model that outperforms frontier models with greater interpretability at lower cost • Implement @CircleCI's CI/CD pipeline to catch regressions before they reach users See our complete developer’s guide here: contextual.ai/blog/lmunit-ci… Stop relying on "vibes" and start building AI you can trust! #AITesting #LLMOps #DevOps #Agents #LLM #Evaluation
19
Juan Manuel Ciro retweeted
$20k/month for an AI agent is an interesting pricing decision.Ā We're in new territory, where potential buyers are almost explicitly being forced to choose between a human vs an AI to do a particular job.. Not sure if this is the right strategy, but certainly interesting times ahead.Ā I think we can all agree on the specialization part though. Glad to see a giant in the industry embrace something we've been saying from the start.
2
15
1,361
Juan Manuel Ciro retweeted
Introducing LMUnit: Natural language unit testing for LLM evaluation How do you really know if your language model is behaving the way you expect? When evaluation is this critical, your best methodology shouldn't just be vibes. With SOTA results on FLASK & BigGenBench and top-10 on RewardBench, LMUnit brings the rigor and familiarity of traditional software engineering unit testing to LLM evaluation. Read on to learn how we built it and try it for free using our API šŸ‘‡ šŸ”— 🧵 (1/5)
6
25
163
56,979
Juan Manuel Ciro retweeted
A real honour and career dream that PRISM has won a @NeurIPSConf best paper award! 🌈 One year ago I was sat in a 13,000 person audience of NeurIPs '23 having just finished data collection. Safe to say I've gone from feeling #stressed to very #blessed 😁
Announcing the NeurIPS 2024 Best Paper Awards: blog.neurips.cc/2024/12/10/a…
27
38
418
79,805
Juan Manuel Ciro retweeted
Wahoo PRISM will officially be taking a trip to @NeurIPSConf this year as an oral presentation 🤩 (and my first ever 10/10 in a conference review process 🤯)
Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs arxiv.org/abs/2404.16019
5
11
160
29,417
Juan Manuel Ciro retweeted
Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs arxiv.org/abs/2404.16019
20
93
431
120,338
Juan Manuel Ciro retweeted
Today, we’re excited to announce RAG 2.0, our end-to-end system for developing production-grade AI. Using RAG 2.0, we’ve created Contextual Language Models (CLMs), which achieve state-of-the-art performance on a variety of industry benchmarks. CLMs outperform strong RAG baselines built using GPT-4 and top open-source models like Mixtral, according to our research and customers. Read more in our blog post: rag2.ai
33
118
917
195,546
Juan Manuel Ciro retweeted
19 Mar 2024
RAG 2.0 is turning LLMs from being an awesome toy to a tool that one can safely rely on - so businesses can actually start using AI in their workflows. We at Contextual AI have done an awesome groundbreaking work to make it work. Please see the break down of how and why it works here:
Today, we’re excited to announce RAG 2.0, our end-to-end system for developing production-grade AI. Using RAG 2.0, we’ve created Contextual Language Models (CLMs), which achieve state-of-the-art performance on a variety of industry benchmarks. CLMs outperform strong RAG baselines built using GPT-4 and top open-source models like Mixtral, according to our research and customers. Read more in our blog post: rag2.ai
1
11
98
9,478
Juan Manuel Ciro retweeted
Contextual AI leverages @googlecloud GKE Autopilot for our retrieval augmented language model technology, optimized for enterprise workflows. Discover how #GKE streamlines operations, enhances performance, and reduces costs for AI applications: cloud.google.com/blog/produc…
5
19
2,030
Juan Manuel Ciro retweeted
Gemini 1.5 looks awesome. But why is a 10M context window the end of RAG?
1
1
10
921