[Quote] LMArena on X

08 Apr 2025

[Quote] LMArena on X

Meta should have made it clearer that “Llama-4-Maverick-03-26-Experimental” was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.

We now have acknowledgement from LMArena of what we already knew: AI labs are cheating to get their models as high as possible in the LMArena leaderboard / benchmarks.

This is inevitable, all of them want to win the AI race at any cost. If you don't want to be fooled by ever-slightly-increasing benchmarks, you should set up your own benchmarks that measure their performance on your own use cases.