✨ Introducing Custom Evaluations — Test Model Responses and Build Real Feedback Loops
Today, we're introducing `opper.evaluate()` — flexible scaffolding for evaluating model responses, built right into our SDKs.
Because no matter how clearly we describe a task, models are still probabilistic. You can't just trust the output. You have to test it.
✅ Support custom evaluators — code, eval frameworks, or LLM-as-a-judge.
✅ Automatically upload and track eval results on the platform — filter, observe, fix.
✅ Act on evaluation results directly inside your code — close the loop, not just measure it.
Pricing: $0.50 per 1,000 metrics