How Gradient-Free Training Could Decentralize AI | HackerNoon
Briefly

The core idea behind the BitNet b1.58 architecture is that highly efficient large language models can be built using only three types of weights (-1, 0, 1), eliminating multiplications.
Despite the advantages, this quantized architecture cannot utilize gradient descent for training, raising the question of whether gradient-free methods could enhance training efficiency in LLMs.
The financial barrier for training large language models remains high, with costs starting at several million dollars, as seen with DeepSeek v3's $6 million price tag.
Exolabs demonstrated a significant trend towards the development of smaller models derived from larger ones, capable of functioning on low-performance devices, showcasing the evolution in LLM efficiency.
Read at Hackernoon
[
|
]