Google gives enterprises new controls to manage AI inference costs and reliability
Briefly

Google gives enterprises new controls to manage AI inference costs and reliability
"Flex Inference is priced at 50% of the standard Gemini API rate, but offers reduced reliability and higher latency. It is suited for background CRM use cases where immediate responses are not critical."
"Priority Inference ensures higher availability for interactive jobs that require immediate responses, allowing developers to manage workloads more effectively without maintaining separate architectures."
"The introduction of Flex and Priority tiers addresses the growing complexity of enterprise AI applications, moving beyond simple chatbots to more intricate workflows that demand different handling of tasks."
Google's Gemini API now features Flex Inference and Priority Inference tiers, allowing enterprise developers to manage AI inference costs and reliability based on workload urgency. Flex Inference is cheaper but offers lower reliability, while Priority Inference ensures higher availability for time-sensitive tasks. These tiers simplify the architecture for developers by enabling both background and interactive jobs to be routed through a single synchronous interface. Additionally, Google released Gemma 4, an advanced open model for local deployment, enhancing options for developers.
Read at InfoWorld
Unable to calculate read time
[
|
]