Thinking Machines Lab wants to make AI models more consistent | TechCrunch
Briefly

Thinking Machines Lab wants to make AI models more consistent | TechCrunch
"There's been great interest in what Mira Murati's Thinking Machines Lab is building with its $2 billion in seed funding and the all-star team of former OpenAI researchers who have joined the lab. In a blog post published on Wednesday, Murati's research lab gave the world its first look into one of its projects: creating AI models with reproducible responses."
"The research blog post, titled "Defeating Nondeterminism in LLM Inference," tries to unpack the root cause of what introduces randomness in AI model responses. For example, ask ChatGPT the same question a few times over, and you're likely to get a wide range of answers. This has largely been accepted in the AI community as a fact - today's AI models are considered to be non-deterministic systems- but Thinking Machines Lab sees this as a solvable problem."
"The post, authored by Thinking Machines Lab researcher Horace He, argues that the root cause of AI models' randomness is the way GPU kernels - the small programs that run inside of Nvidia's computer chips - are stitched together in inference processing (everything that happens after you press enter in ChatGPT). He suggests that by carefully controlling this layer of orchestration, it's possible to make AI models more deterministic."
Thinking Machines Lab is pursuing reproducible LLM outputs by eliminating nondeterminism in inference. The team identifies GPU kernel orchestration—the sequencing and composition of small programs on Nvidia chips—as a principal source of variability in responses. Precise control of that orchestration can make inference deterministic, producing consistent outputs for identical prompts. Deterministic inference can improve enterprise reliability, scientific reproducibility, and reinforcement learning training by reducing noise in reward signals. The effort focuses on engineering the inference layer rather than changing model architectures, targeting kernel-level behavior and orchestration to achieve more predictable and repeatable LLM behavior.
Read at TechCrunch
Unable to calculate read time
[
|
]