Open Agent Leaderboard

Systematic evaluation of AI agents across diverse environments — without domain-specific tuning.

Leaderboard

Cost-Performance Frontier

The Pareto frontier of agent efficiency — accuracy vs. spend.