Introducing
@Laureum_ai โ quality scoring for MCP servers and AI agents by
@assisterr
We score 6 dimensions: accuracy, safety, reliability, process quality, latency, and schema quality.
Multi-judge LLM consensus adversarial probes.
We've scored 28 public MCP servers to date.
Average: 68.3/100. 6 in Expert tier (โฅ85).
The weakness nobody else measures: process quality โ averaging 55.5/100.
Here's why we built it๐
Three gaps in agent eval today:
โ Marketplaces curate by hand. A major MCP catalog operator pruned 17 abandoned /vanity / impersonation entries from their own catalog earlier this month โ manually.
โ Eval frameworks (LangSmith, Braintrust, Galileo) score tool-call correctness well. Process quality โ error handling, input validation, response structure โ sits between them, and nobody surfaces it as a named composite.
โ Post-Drift, the Solana ecosystem just launched STRIDE for smart-contract security. Agent infra still ships without pre-deploy quality gates.
Laureum is the missing layer.
Free right now, no signup:
1/ Quick Scan โ paste any MCP server URL, get a 30-second 6-axis score โ
laureum.ai/evaluate
2/ Public leaderboard โ see how the most-used servers rank โ
laureum.ai/leaderboardIf you're building, run yours. Reply with your score โ we'll feature the top 5 this week.
End of the tweet.