OpenAI unveils first model running on Cerebras silicon
Briefly

OpenAI unveils first model running on Cerebras silicon
"On Thursday, OpenAI unveiled GPT-5.3-Codex-Spark, its first model that will run on Cerebras Systems' dinner-place-sized AI accelerators, which feature some of the world's fastest on-chip memory. The lightweight model is designed to provide a more interactive experience to users of OpenAI's Codex code assistant by leveraging Cerebras' SRAM-packed CS3 accelerators to generate responses at more than 1,000 tokens per second."
"Cerebras' waferscale architecture is notable for using a kind of ultra-fast, on-chip memory called SRAM, which is roughly 1,000x faster than the HBM4 found on Nvidia's upcoming Rubin GPUs announced at CES earlier this year. This, along with optimizations to the inference and application pipelines, allows OpenAI's latest model to churn out answers in the blink of an eye. As Spark is a proprietary model, we don't have all the details on things like parameter count,"
OpenAI introduced GPT-5.3-Codex-Spark, a lightweight, text-only code assistant optimized for Cerebras CS3 accelerators that use dense on-chip SRAM. The setup enables response generation at more than 1,000 tokens per second and supports a 128,000-token context window. OpenAI signed a $10 billion contract to deploy up to 750 megawatts of Cerebras silicon for its newest GPT models. Cerebras' waferscale SRAM is roughly 1,000× faster than HBM4 on Nvidia's Rubin GPUs, and pipeline optimizations reduce latency. Spark defaults to minimal targeted edits and avoids running debug tests unless explicitly requested. Parameter counts remain undisclosed.
Read at Theregister
Unable to calculate read time
[
|
]