GitHub - SeraphimSerapis/tool-eval-bench: Tool-calling quality benchmark for LLM serving stacks....
Tool-calling quality benchmark for LLM serving stacks. 80 deterministic scenarios testing multi-turn orchestration, safety boundaries, and structured output. Supports vLLM, LiteLLM, and llama.cpp....
github.com