El Reg digs its claws into Alibaba's QwQ
Briefly

Alibaba's Qwen team has developed QwQ, a large language model that utilizes reinforcement learning to enhance its performance in reasoning tasks. Despite having only 32 billion parameters, QwQ reportedly rivals larger counterparts such as DeepSeek R1 in benchmarks involving mathematics, coding, and function-calling. An innovative aspect of QwQ is the integration of an accuracy verifier and code execution server, ensuring that the model receives rewards exclusively for valid outputs. Testing reveals QwQ's competitive capabilities, making it a noteworthy contender within AI language processing.
Alibaba's Qwen team aims to find out how much reinforcement learning and verification can improve large language models, specifically with their latest release, QwQ.
Despite having only 32 billion parameters, Alibaba claims QwQ outperforms larger models like DeepSeek R1 in math, coding, and function-calling benchmarks.
Reinforcement learning is employed by QwQ to enhance its reasoning capabilities by rewarding models for correct answers, thus improving response accuracy.
QwQ's integration of an accuracy verifier and code execution server ensures that rewards are granted only for accurate solutions, setting a high performance standard.
Read at Theregister
[
|
]