AIR-Bench, the first generative evaluation benchmark for audio-language models, covers 19 audio tasks and emphasizes evaluating model performance with a standardized framework.
This benchmark includes over 19,000 single-choice questions and more than 2,000 open-ended audio questions, addressing diverse audio types like speech, music, and natural sounds.
A novel audio mixing strategy is proposed to enhance the realism of audio simulations, allowing more accurate evaluations aligned with real-world scenarios.
Plans include launching and maintaining a leaderboard for the community to consistently access and compare the performance of various open-source audio-language models.
Collection
[
|
...
]