⚙️ Agents are the “killer” LLM app, but building and evaluating agents is hard.
A huge part of agents is tool use, but there aren't enough open-source tool use benchmarks out there.
Today, we are excited to release four new test environments for benchmarking LLMs’ ability to effectively use tools.
📖
blog.langchain.dev/benchmark…
🧵 Below are some of our preliminary results