Google Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic Capabilities

"The release introduces native video and image processing across the lineup, audio input on the smaller models, context windows up to 256K tokens, and benchmark results that place the 31B dense variant in a bracket typically occupied by models three to five times its size."

"Google reports that the 31B variant scores 84.3% on GPQA Diamond and 80.0% on LiveCodeBench v6, reflecting substantial gains in science reasoning and code generation."

"The 26B MoE model activates only 3.8 billion parameters during inference to deliver fast tokens-per-second, while the 31B dense variant targets workloads where consistent per-token cost matters more than peak parameter count."

"All four variants natively process video and images at variable resolutions, and the E2B and E4B edge models add native audio input for speech recognition and understanding."

Gemma 4 includes open-weight models with 2B, 4B, 26B, and 31B variants, offering video and image processing, audio input, and extensive context windows. The 31B model excels in benchmarks, scoring 84.3% on GPQA Diamond and 80.0% on LiveCodeBench v6, showcasing improvements in science reasoning and code generation. The models support function-calling and structured JSON output, enabling developers to create autonomous agents. The architecture includes dense and sparse designs, with edge models optimized for mobile devices and larger models capable of processing extensive data in a single prompt.

#gemma-4 #ai-models #video-processing #image-processing #machine-learning

Read at InfoQ

Unable to calculate read time

Collection

[

...

]

Google Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic CapabilitiesGoogle Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic Capabilities Briefly

Google Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic Capabilities
Google Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic Capabilities
Briefly