Introducing
@Laureum_ai ā quality scoring for MCP servers and AI agents by
@assisterr
We score 6 dimensions: accuracy, safety, reliability, process quality, latency, and schema quality.
Multi-judge LLM consensus adversarial probes.
We've scored 28 public MCP servers to date.
Average: 68.3/100. 6 in Expert tier (ā„85).
The weakness nobody else measures: process quality ā averaging 55.5/100.
Here's why we built itš
Three gaps in agent eval today:
ā Marketplaces curate by hand. A major MCP catalog operator pruned 17 abandoned /vanity / impersonation entries from their own catalog earlier this month ā manually.
ā Eval frameworks (LangSmith, Braintrust, Galileo) score tool-call correctness well. Process quality ā error handling, input validation, response structure ā sits between them, and nobody surfaces it as a named composite.
ā Post-Drift, the Solana ecosystem just launched STRIDE for smart-contract security. Agent infra still ships without pre-deploy quality gates.
Laureum is the missing layer.
Free right now, no signup:
1/ Quick Scan ā paste any MCP server URL, get a 30-second 6-axis score ā
laureum.ai/evaluate
2/ Public leaderboard ā see how the most-used servers rank ā
laureum.ai/leaderboardIf you're building, run yours. Reply with your score ā we'll feature the top 5 this week.
End of the tweet.