Humanloop is the LLM evals platform for enterprises. Trusted by Gusto, Vanta and Duolingo to ship reliable AI products.

Joined April 2020
333 Photos and videos
Pinned Tweet
13 Aug 2025
We're thrilled to announce that the Humanloop team is joining @AnthropicAI! Our mission has always been to enable the rapid and safe adoption of AI. Now, as AI progress accelerates, we think Anthropic is the ideal home to continue this work.
25
21
460
243,014
13 Aug 2025
We're thrilled to announce that the Humanloop team is joining @AnthropicAI! Our mission has always been to enable the rapid and safe adoption of AI. Now, as AI progress accelerates, we think Anthropic is the ideal home to continue this work.
25
21
460
243,014
13 Aug 2025
10
17,849
Humanloop retweeted
Apparently @humanloop is the best? Has anyone actually tried them?
alright. let's settle this.
1
3
4,134
Humanloop retweeted
3 Jul 2025
AI-native products are evolving fast. At our 𝐔𝐧𝐥𝐨𝐜𝐤 𝐭𝐡𝐞 𝐅𝐮𝐭𝐮𝐫𝐞: 𝐀𝐈 𝐁𝐫𝐞𝐚𝐤𝐟𝐚𝐬𝐭 with @humanloop & @googlecloud, we explored what it takes to build, evaluate, and differentiate LLM apps. Thank you to everyone who joined. #AI #LLMs #Startups
1
6
1,886
11 Apr 2025
MCP is rapidly becoming the universal adapter for AI. Since its release in November, developers and teams have raced to adopt the standard, giving agents the tools they need to interface with the real world, from APIs to internal systems. Our latest explainer breaks down MCP: what it is, how it works, and how to get started. 🔗 Read here: humanloop.com/blog/mcp
11
2
13
2,087
Humanloop retweeted
13 Mar 2025
Awesome open source package to strongly type your @humanloop prompts in your code. Also great explainer of the challenges of collaborating on prompt engineering with subject matter experts.
11 Mar 2025
Tackling prompt engineering challenges led us to develop an open-source, type-safe prompt management system at Boardy. Huge shoutout to Greg for enhancing our AI’s natural interactions! By separating prompt management from code, we’ve streamlined our workflow and reduced errors. Excited to share this tool with the community! #AI #PromptEngineering #OpenSource
1
1
8
2,499
14 Mar 2025
Wow. Everything is tokens.
2
3
1,176
11 Mar 2025
🗓️ Wednesday March 12th at 10:00 PT Our CEO @RazRazcle will be speaking at the MLOps Community’s 'AI in Production 2025' about Eval-Driven AI Development. What to expect: • Learn how top AI teams use evaluation-driven development to guide model improvements and avoid common pitfalls. • Discover how to leverage code-based, LLM-as-judge, and human evaluators to optimize LLM performance. • Gain insights from Brianna Connelly, VP of Data Science at @filevine, on how their AI team uses evals on Humanloop to refine AI applications and RAG systems. Register to take part virtually (link below)
2
4
1,205
20 Feb 2025
📍PMs in AI Meetup, London 🇬🇧 Yesterday we held a Meetup in the UCL Centre for Artificial Intelligence for product managers working on AI agents and applications. Huges thanks to all who turned up (it was a full house!) and to our speakers: • @samstphenson (Founder, @meetgranola) - who advised on making your 1 AI feature extremely effective before trying to add any more. • @Albertorizzoli (Co-founder, @V7labs) - who said to listen to user problems, not their proposed solutions (this is more true than ever with AI). • @RazRazcle (Co-founder, @humanloop) - advised to bring domain experts into the prompt engineering and evaluation process as early as possible to drive differentiated and effective AI performance. The London AI community is next level 🚀 What should be the theme of our next meetup? 👀
3
1
25
4,182
Humanloop retweeted
18 Feb 2025
cool event with @RazRazcle from @humanloop, @samstphenson from @meetgranola and Alberto from @V7Labs Thinking about AI from a PM mindset. nice to see more events take a different spin on the broad AI industry in London.
2
10
1,359
17 Feb 2025
Very excited to share that Humanloop is named as one of Emerging Leaders in the Emerging Market Quadrant for Generative AI Engineering in the 2025 Gartner® Innovation Guide for Generative AI Technologies. Building reliable AI products doesn't have to be a guessing game - you need an eval-driven workflow and the right collaboration between engineers, product and domain experts. That’s how teams ship AI that works. Companies like Gusto, Vanta & Duolingo use Humanloop to build great AI products, we love helping companies navigate this new paradigm - reach out to ask questions or book a demo with us humanloop.com Read more in Gartner 2025 February edition of the Innovation Guide for Generative AI Technologies
1
667
13 Feb 2025
🚀 We’re thrilled to be part of ProductCon on Feb 19, 2025, in London! ProductCon brings together the brightest minds in tech to share best practices for building world-class products. 📍 Stop by our booth to see how PMs use Humanloop to build differentiated AI products that perform reliably at scale. 🔗 Grab your ticket here: lnkd.in/e_dEnQqX
1
698
12 Feb 2025
How do you take your AI product from good to great to game-changing? Next Tuesday (Feb 18th) we’re hosting a Product Managers in AI Meetup in Bloomsbury, London 🇬🇧 Join us for a panel on "How to Build AI Products That Delight Users" with guest speakers: • @samstphenson, Founder, @meetgranola@Albertorizzoli, Co-founder, @V7Labs@RazRazcle, Co-founder, @humanloop Food and drinks will be provided. Limited availability — register here: lu.ma/gwojmdql
1
4
862
11 Feb 2025
Today we’re introducing Templates - a library of Prompts, Evaluators, and Datasets, designed to accelerate time to value when developing and evaluating AI applications. One of the biggest challenges in testing AI applications and agents is accessing the right datasets and evaluators. So we’ve collaborated with @huggingface to make this easier. With Templates, the best and most popular golden datasets on Hugging Face are instantly accessible in Humanloop, alongside our fully customizable pre-set evaluators, to help you streamline LLM evaluations. No more starting from scratch - easily test your prompts and agents for jailbreak vulnerabilities, PII leaks, text-to-SQL accuracy, domain-specific reasoning, and lots more — powered by @huggingface Datasets and @humanloop Evals. Templates are live now! (Link below to learn more).
1
1
8
1,100
5 Feb 2025
When do you know it's time to try fine-tuning instead of prompt engineering? Our CEO @RazRazcle is on Data Radicals with @satyx this week to discuss: 🔹 How fine-tuning tends to be an optimization step, which comes once you've pushed the limits of prompt engineering 🔹 Why collaboration with domain experts in the AI product development cycle is key to driving successful outcomes 🔹 How software engineering is changing in the age of AI And lots more! Watch the full episode here: alation.com/podcast/episodes…
1
2
773
5 Feb 2025
🇬🇧 London - come to our first AI Product Management Meetup on Tuesday Feb 18th! Meet with fellow AI product leaders and enjoy food and drinks on us in Bloomsbury. We’re hosting a panel featuring guests who are building world-class AI products, sharing their learnings, followed by a social event for all attending. RSVP (space limited): lu.ma/gwojmdql

1
2
520
1 Feb 2025
Release notes 01/31/2025 New models: • o3-mini is now available in Humanloop! It has 200,000 token context length, with 100,000 output tokens and it can show superior performance to o1, but for 9x cheaper and 4x faster. • DeepSeek V3 and R1 (non-distilled) via DeepSeek API. We also added DeepSeek-R1-Distill-Llama-70B via Groq. Evals: • Quickly compare performance of various prompts/models using aggregate eval stats in the UI (see below). • Better filtering across evals for errors as well as specific judgements and models. Read more (link below)
1
1
698