
"In a paper published in the science journal Nature, the DeepSeek AI team say they have established that its LLMs can be incentivized to learn to reason without getting examples from humans. In this way, reinforcement learning, akin to learning through trial and error, can slash the human input required to boost their model's performance. They argue that the approach improves performance on math and coding problems beyond that of LLMs trained on a corpus of human text and examples."
"Chinese AI company DeepSeek has shown it can improve the reasoning of its LLM DeepSeek-R1 through trial-and-error based reinforcement learning, and even be made to explain its reasoning on math and coding problems, even though explanations might sometimes be unintelligible. The release of DeepSeek-R1 in January 2025 inspired a $589 billion wipeout of Nvidia's market value, as investors feared it represented an easier and cheaper route to natural language question answering systems such ChatGPT, from Silicon Valley darling OpenAI."
DeepSeek used trial-and-error reinforcement learning to improve reasoning in its LLM DeepSeek-R1 and to generate step-by-step explanations for math and coding problems, although some explanations can be unintelligible. The January 2025 release of DeepSeek-R1 triggered a $589 billion drop in Nvidia's market value as investors feared a cheaper route to natural-language question answering systems. LLMs can be incentivized to learn to reason without human example labels, reducing human input while improving performance on math and coding tasks beyond models trained only on human text and examples. Reinforcement learning resembles a child learning a video game by trial and error, contrasting with prompting and supervised approaches.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]