Parea AI (YC S23) provides tools for evaluating, testing and monitoring LLM applications.

Joined April 2023
16 Photos and videos
PareaAI retweeted
9 Sep 2024
.@PareaAI also looks like a good LLM monitoring tool and is open source
2
4
21
2,240
PareaAI retweeted
How do you detect unreliable behavior of your LLM app? Recently, we talked to the team at @sixfoldai and they shared with us a simple, yet powerful way to assess the reliability of their LLM app using @PareaAI. More about how they test their risk assessment AI solution for insurance underwriters in the article in the thread
1
3
11
857
PareaAI retweeted
Saturdays are for doc upgrades
1
6
546
PareaAI retweeted
🚀 New deep dive notebook on @PareaAI experiments and LLM evals 📝🔬. I cover some of the key functionalities illustrating the power and flexibility of our API. 🔽 Link in comments 🔽
1
3
5
483
PareaAI retweeted
Replying to @cohere
@cohere 's actually pretty awesome. More folks should be exploring their models. @PareaAI , now has auto-instrumentation for the Cohere py sdk 🚀
2
3
6
382
PareaAI retweeted
There are so many “black box” evals that force users to instantiate eval classes. Never fully understood this. At @PareaAI we see evals as just functions. You can copy the source code and modify as you see fit, all OSS and based on latest research. Check these out👇🏾
2
2
1
150
PareaAI retweeted
📝 Updated integration docs ⭐️ Checkout @PareaAI's updated docs to automatically trace apps powered by @LangChain, instructor by @jxnlco, @LiteLLM, DSPy by @lateinteraction, SGLang by @lmsysorg, and @triggerdotdev. Docs: docs.parea.ai/integrations/o…
3
4
532
PareaAI retweeted
Day 1 support for llama 3.1 via @FireworksAI_HQ in @PareaAI's playground! 🧨🦙
2
7
278
PareaAI retweeted
23 Jul 2024
And to help you understand what's going on, we integrate with observability platforms like @ArizePhoenix, @langchain's LangSmith, @langfuse, @PareaAI, and @lunary_hq so you can explore the experiments that zenbase/core automates. Cookbooks here: github.com/zenbase-ai/core/t…

3
5
17
798
PareaAI retweeted
Def agree this could be great. Probably best if you can train the router yourself. @anyscalecompute's RouterLLM tracing support with @PareaAI
RouteLLM is one of the most impactful algorithmic innovations in AI that I've ever seen. I don't think people realize how important it truly will become. Here's a full tutorial for how to use it:
2
3
2
290
PareaAI retweeted
With the latest @GroqInc models for tool calling, we figured it was time to make Groq available across @PareaAI's playground and SDK's. Be on the lookout for an updated tool-calling benchmark, OpenAI v Claude v Groq!
2
4
165
PareaAI retweeted
📝 Updated self-deployment docs ⭐️ Deploy @PareaAI on-prem via @Docker in 4 steps: 1. Clone the repo 2. Specify organization slug 3. Pull docker images & run them 4. Point SDK backend URL to self-deployed backend URL 🔗 -> 🧵
1
1
3
234
PareaAI retweeted
There have been so many new models lately. Most recently, @MistralAI 's codestral-mamba. I figured it'd be great to highlight how to use @PareaAI for Regression Testing. Check out the Notebook below, where I test codestral-latest vs mamba on LeetCode questions. 👇
2
3
7
230
PareaAI retweeted
At this point I could probably have an llm monitor the top foundation model providers and then produce a PR for me that adds any new models to @PareaAI the moment they launch.
1
3
90
PareaAI retweeted

1
3
173
PareaAI retweeted
If you use structured outputs with Instructor, track validation errors instantly with @PareaAI. Concretely, the integration automatically: - groups any LLM call due to retries together under a single trace - tracks any field which failed validation with the respective error message - visualizes validation error count over time Instrument calls made via the Instructor client by adding two lines: p = Parea(api_key="PAREA_API_KEY") p.wrap_openai_client(client, "instructor") Read the full blog post on the instructor docs in the 🧵
2
6
17
2,160
PareaAI retweeted
Moving from demos to production-ready LLM apps can be challenging. In this post, I outline a practical workflow to help teams make this transition, focusing on: - Hypothesis testing - Dataset creation - Effective evals - Experimentation Full post here: zurl.co/27Ad
2
5
185
PareaAI retweeted
This method is powered by DSPy from @lateinteraction and inspired by the work of @sh_reya: arxiv.org/pdf/2404.12272 arxiv.org/pdf/2401.03038 Also, thanks to @eugeneyan sharing JudgeBench: arxiv.org/abs/2406.18403
1
4
22
2,877