OpenAI Codex-Spark Achieves Ultra-Fast Coding Speeds on Cerebras Hardware
Briefly

OpenAI Codex-Spark Achieves Ultra-Fast Coding Speeds on Cerebras Hardware
"In a major shift in its hardware strategy, OpenAI launched GPT-5.3-Codex-Spark, its first production AI model deployed on Cerebras wafer-scale chips rather than traditional Nvidia GPUs. The new model offers delivers improved throughput and low-latency, enabling a real-time, interactive coding experience."
"Codex-Spark runs at roughly 1,000 tokens per second, about 15× faster than earlier versions, making live coding assistance and rapid iteration much more responsive. OpenAI says the new model was designed specifically for working with Codex in real-time-making targeted edits, reshaping logic, or refining interfaces and seeing results immediately."
"Under the hood, we streamlined how responses stream from client to server and back, rewrote key pieces of our inference stack, and reworked how sessions are initialized so that the first visible token appears sooner and Codex stays responsive as you iterate."
OpenAI launched GPT-5.3-Codex-Spark, a production AI model running on Cerebras wafer-scale chips instead of traditional Nvidia GPUs. The model processes at approximately 1,000 tokens per second, delivering 15 times faster performance than previous versions. Optimized specifically for real-time coding workflows, Codex-Spark enables live coding assistance with immediate results for targeted edits and logic reshaping. Despite prioritizing speed and low latency, the model maintains capability for long-running processes spanning hours, days, or weeks. Performance benchmarks on SWE-Bench Pro and Terminal-Bench 2.0 demonstrate results between GPT-5.1-Codex-mini and GPT-5.3-Codex achieved in significantly reduced time. OpenAI streamlined response streaming, rewrote inference stack components, and optimized session initialization to minimize latency across the full request-response pipeline.
Read at InfoQ
Unable to calculate read time
[
|
]