OpenAI swaps Nvidia for Cerebras with GPT-5.3-Codex-Spark
Briefly

OpenAI swaps Nvidia for Cerebras with GPT-5.3-Codex-Spark
"OpenAI releases GPT-5.3-Codex-Spark, a smaller AI encoding model that generates over 1,000 tokens per second on Cerebras hardware. It is OpenAI's first GPT model that does not run on Nvidia. The model is optimized for ultra-fast inferencing on Cerebras' Wafer Scale Engine 3, with OpenAI adding a latency-first serving tier to the existing infrastructure. The speed comes in handy for interactive work where developers need immediate feedback."
"In January, OpenAI announced a multi-year partnership with Cerebras, whereby the company purchases large-scale computing power to support its AI services. That deal reportedly includes up to 750 megawatts of computing power over three years. Codex-Spark is the first concrete result of this collaboration. Speed versus intelligence OpenAI's latest frontier models can work autonomously for hours, days, or weeks on long-running tasks. Codex-Spark complements this with a model for real-time adjustments. Developers can interrupt or make adjustments while working, and the model responds immediately."
"By focusing on speed, Codex-Spark keeps the working method light. It makes minimal, targeted adjustments. At launch, the model has a 128k context window and is text-only. During the preview, separate rate limits apply and may fluctuate during periods of high demand. GPT-5.3-Codex-Spark performs strongly on benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0, completing tasks in a fraction of the time compared to GPT-5.3-Codex."
OpenAI released GPT-5.3-Codex-Spark, a smaller encoding model optimized for ultra-fast inference on Cerebras' Wafer Scale Engine 3, achieving over 1,000 tokens per second. The model prioritizes latency with a latency-first serving tier, a 128k context window, and text-only operation during preview with variable rate limits. Codex-Spark enables interactive developer workflows by making minimal, targeted edits and supporting real-time interruptions and adjustments. The model outperforms previous versions on benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0, and OpenAI implemented pipeline-wide latency improvements that benefit all models.
Read at Techzine Global
Unable to calculate read time
[
|
]