
"Google has released TranslateGemma, a new suite of open translation models built on the Gemma 3 architecture. The release includes three model sizes 4B, 12B, and 27B parameters, and targets machine translation across 55 languages. The models are designed to run in a range of environments, from mobile and edge devices to consumer hardware and cloud accelerators, and are available as open models for developers and researchers."
"TranslateGemma is the result of a training process focused on efficiency and transfer of knowledge from larger proprietary systems. Google used a two-stage approach that combines supervised fine-tuning with reinforcement learning. In the supervised phase, the base Gemma 3 models were trained on parallel datasets composed of both human-produced translations and synthetic translations generated by Gemini models. This mix was intended to increase coverage across language families, including low-resource languages, while maintaining consistency in translation quality."
"In the reinforcement learning stage, the models were optimized using an ensemble of automatic reward signals. These included quality estimation and machine translation metrics such as MetricX-QE and AutoMQM, which aim to capture adequacy and fluency beyond simple reference matching. According to Google, this approach led to notable gains in parameter efficiency. On the WMT24++ benchmark, the 12B TranslateGemma model reportedly achieved lower error rates than the larger 27B Gemma 3 baseline, while the 4B model approached the performance of the 12B baseline."
TranslateGemma comprises open translation models based on Gemma 3 offered in 4B, 12B, and 27B parameter sizes covering 55 languages. Training combined supervised fine-tuning on parallel datasets of human and synthetic Gemini-generated translations with a reinforcement learning stage using ensemble automatic reward signals like MetricX-QE and AutoMQM. The training emphasized efficiency and knowledge transfer from larger proprietary systems to improve parameter efficiency. Evaluation on WMT24++ indicated the 12B model had lower error rates than the 27B Gemma 3 baseline while the 4B model approached 12B performance. Nearly 500 additional language pairs were trained for broader research use.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]