#low-latency-inference

[ follow ]
Artificial intelligence
fromTechzine Global
1 week ago

OpenAI swaps Nvidia for Cerebras with GPT-5.3-Codex-Spark

GPT-5.3-Codex-Spark is a Cerebras-optimized, low-latency encoding model generating over 1,000 tokens/sec to enable immediate, minimal, real-time developer code adjustments.
Artificial intelligence
fromTechCrunch
1 week ago

A new version of OpenAI's Codex is powered by a new dedicated chip | TechCrunch

OpenAI released GPT-5.3-Codex-Spark, a lightweight, low-latency Codex model using Cerebras WSE-3 hardware to accelerate inference and enable real-time collaboration and rapid prototyping.
Artificial intelligence
fromTechCrunch
1 month ago

OpenAI signs deal, reportedly worth $10 billion, for compute from Cerebras | TechCrunch

OpenAI signed a multi-year agreement with Cerebras for 750 megawatts of compute through 2028 to accelerate low-latency inference and speed customer-facing responses.
[ Load more ]