"By normalizing each benchmark's scores to a scale where random performance is 0 and perfect performance is 100 before averaging, the relative weighting of each benchmark in the final score is adjusted based on how much a model's performance exceeds random chance."
Collection
[
|
...
]