See how models compare against each other based on real coding tasks and votes from Devin Desktop users using Arena Mode.
Loading chart data...
Scores are calculated using ELO ratings from Arena Mode usage.
User preference is derived from side-by-side comparisons where users select their preferred response. The chosen response replaces the other and becomes the basis for the next turn.
Battle groups include models representative of daily Devin usage and are updated as new models become available. Some model tiers may not yet appear on the leaderboard.
Unlike other leaderboards, Devin Arena does not penalize models for faster generation speed by holding back responses.