In the foundation benchmark results, we observe that models like Qwen-Audio-Chat and Qwen-Audio Turbo outperform others in areas such as speech and sound generation, indicating their advanced capabilities.
The comparative analysis showcased that models like BLSP and SALMONN excel in single-choice instruction tasks, despite difficulties in exact choice extraction due to diverse output formats from different LALMs.
Collection
[
|
...
]