George Halal

George Halal

3 Photos and videos

Tweets

Juan Manuel Ciro retweeted

George Halal @halal_george

27 Aug 2025

Excited to share that we trained rerankers at the cost/performance frontier and are open sourcing them! Contextual AI Reranker v2 🚀 Best performing, most efficient reranker 🤗 Open weights (1B, 2B, 6B) 🫡 Instruction-following (including recency-awareness) 🌐 Multilingual 1/4

168

31,244

Douwe Kiela

Juan Manuel Ciro retweeted

Douwe Kiela

@douwekiela

23 Jul 2025

Context engineering has become the critical bottleneck for enterprise AI. Your AI agent works perfectly in demos but breaks down with real-world data complexity. Why? I see 6 fundamental challenges that every AI engineer faces: from the "needle in a haystack" problem where models lose critical information buried in long contexts, to the token cost explosion that makes production deployments prohibitively expensive. These are more than just technical hurdles, they're the difference between AI experiments and transformative business impact. Read my full thoughts below.

1,795

Rajiv Shah

Juan Manuel Ciro retweeted

Rajiv Shah

@rajistics

16 Jul 2025

Introducing: Contextual AI MCP Server (now hosted) After great feedback with our local MCP server, we have added a hosted MCP server inside the platform! This mean every RAG agent is easily accessible via MCP. Added updated info on the hosted MCP server here: github.com/ContextualAI/cont…

2,383

CircleCI

Juan Manuel Ciro retweeted

CircleCI @CircleCI

3 Apr 2025

Tired of guessing if your LLM responses are “good enough?” With @ContextualAI’s LMUnit and CircleCI, you can run natural language unit tests directly in your CI/CD pipeline—turning subjective evals into automated, testable checkpoints. Read on: contextual.ai/blog/lmunit-ci…

Ensuring Agent and LLM Quality with CircleCI and LMUnit: A Developer’s Guide | Contextual AI

Build smarter—context-aware AI—that understands your data, workflows, and edge cases.

contextual.ai

1,616

Juan Manuel Ciro

Juan Manuel Ciro @ciropython

3 Apr 2025

The future of CI

Contextual AI

@ContextualAI

3 Apr 2025

🔥 Introducing the most reliable way to evaluate LLMs and agents in production! It's time to stop “vibe testing” your AI systems. Our latest developer's guide shows you how to rigorously test AI systems so that they hold up in production, using Contextual AI's LMUnit evaluation model and @CircleCI’s CI/CD pipeline. You’ll learn how to: • Write natural language unit tests that anyone on your team can understand • Leverage LMUnit – Contextual AI's state-of-the-art, specialized evaluation language model that outperforms frontier models with greater interpretability at lower cost • Implement @CircleCI's CI/CD pipeline to catch regressions before they reach users See our complete developer’s guide here: contextual.ai/blog/lmunit-ci… Stop relying on "vibes" and start building AI you can trust! #AITesting #LLMOps #DevOps #Agents #LLM #Evaluation

Douwe Kiela

Juan Manuel Ciro retweeted

Douwe Kiela

@douwekiela

7 Mar 2025

$20k/month for an AI agent is an interesting pricing decision. We're in new territory, where potential buyers are almost explicitly being forced to choose between a human vs an AI to do a particular job.. Not sure if this is the right strategy, but certainly interesting times ahead. I think we can all agree on the specialization part though. Glad to see a giant in the industry embrace something we've been saying from the start.

1,361

Contextual AI

Juan Manuel Ciro retweeted

Contextual AI

@ContextualAI

18 Dec 2024

Introducing LMUnit: Natural language unit testing for LLM evaluation How do you really know if your language model is behaving the way you expect? When evaluation is this critical, your best methodology shouldn't just be vibes. With SOTA results on FLASK & BigGenBench and top-10 on RewardBench, LMUnit brings the rigor and familiarity of traditional software engineering unit testing to LLM evaluation. Read on to learn how we built it and try it for free using our API 👇 🔗 🧵 (1/5)

163

56,979

Hannah Rose Kirk

Juan Manuel Ciro retweeted

Hannah Rose Kirk @hannahrosekirk

11 Dec 2024

A real honour and career dream that PRISM has won a @NeurIPSConf best paper award! 🌈 One year ago I was sat in a 13,000 person audience of NeurIPs '23 having just finished data collection. Safe to say I've gone from feeling #stressed to very #blessed 😁

NeurIPS Conference

@NeurIPSConf

11 Dec 2024

Announcing the NeurIPS 2024 Best Paper Awards: blog.neurips.cc/2024/12/10/a…

418

79,805

Hannah Rose Kirk

Juan Manuel Ciro retweeted

Hannah Rose Kirk @hannahrosekirk

26 Sep 2024

Wahoo PRISM will officially be taking a trip to @NeurIPSConf this year as an oral presentation 🤩 (and my first ever 10/10 in a conference review process 🤯)

Hannah Rose Kirk @hannahrosekirk

25 Apr 2024

Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs arxiv.org/abs/2404.16019

160

29,417

Hannah Rose Kirk

Juan Manuel Ciro retweeted

Hannah Rose Kirk @hannahrosekirk

25 Apr 2024

431

120,338

Contextual AI

Juan Manuel Ciro retweeted

Contextual AI

@ContextualAI

19 Mar 2024

Today, we’re excited to announce RAG 2.0, our end-to-end system for developing production-grade AI. Using RAG 2.0, we’ve created Contextual Language Models (CLMs), which achieve state-of-the-art performance on a variety of industry benchmarks. CLMs outperform strong RAG baselines built using GPT-4 and top open-source models like Mixtral, according to our research and customers. Read more in our blog post: rag2.ai

A bar chart comparing the accuracy of multiple RAG-based models on a series of benchmarks like NQ and TriviaQA. RAG 2.0 (Contextual Language Model) exceeds SOTA across all benchmarks shown compared to RAG baselines built using GPT-4 and Mixtral.

ALT A bar chart comparing the accuracy of multiple RAG-based models on a series of benchmarks like NQ and TriviaQA. RAG 2.0 (Contextual Language Model) exceeds SOTA across all benchmarks shown compared to RAG baselines built using GPT-4 and Mixtral.

118

917

195,546

Stas Bekman

Juan Manuel Ciro retweeted

Stas Bekman

@StasBekman

19 Mar 2024

RAG 2.0 is turning LLMs from being an awesome toy to a tool that one can safely rely on - so businesses can actually start using AI in their workflows. We at Contextual AI have done an awesome groundbreaking work to make it work. Please see the break down of how and why it works here:

Contextual AI

@ContextualAI

19 Mar 2024

9,478

Contextual AI

Juan Manuel Ciro retweeted

Contextual AI

@ContextualAI

6 Mar 2024

Contextual AI leverages @googlecloud GKE Autopilot for our retrieval augmented language model technology, optimized for enterprise workflows. Discover how #GKE streamlines operations, enhances performance, and reduces costs for AI applications: cloud.google.com/blog/produc…

New features to run AI more efficiently on fully managed GKE | Google Cloud Blog

New compute classes, reservations, and improved price/performance enhance GKE Autopilot for running AI training and serving workloads.

cloud.google.com

2,030

Gautam Mittal

Juan Manuel Ciro retweeted

Gautam Mittal @realgmittal

17 Feb 2024

Gemini 1.5 looks awesome. But why is a 10M context window the end of RAG?

921