#mixture-of-experts

[ follow ]
fromInfoQ
1 week ago

New IBM Granite 4 Models to Reduce AI Costs with Inference-Efficient Hybrid Mamba-2 Architecture

IBM attributes those improved characteristics vs. larger models to its hybrid architecture that combines a small amount of standard transformer-style attention layers with a majority of Mamba layers-more specifically, Mamba-2. With 9 Mamba blocks per 1 Transformer block, Granite gets linear scaling vs. context length for the Mamba parts (vs. quadratic scaling in transformers), plus local contextual dependencies from transformer attention (important for in-context learning or few-shots prompting).
Artificial intelligence
Artificial intelligence
fromInfoQ
1 week ago

Kimi's K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer

Kimi K2 is a Mixture-of-Experts LLM (32B activated, 1.04T total) trained on 15.5T tokens using MuonClip to improve training stability.
Artificial intelligence
fromInfoQ
2 months ago

xAI Releases Grok Code Fast 1, a New Model for Agentic Coding

grok-code-fast-1 is an agentic coding model optimized for tool usage, high throughput, long context, and seamless integration with developer workflows.
Artificial intelligence
fromMedium
2 months ago

Microsoft AI Unveils MAI-Voice-1 and MAI-1-Preview to Power the Next Generation of AI

Microsoft released MAI-Voice-1 and MAI-1-preview to deliver high-speed expressive speech and an internally built foundation model for improved instruction following and text responses.
fromInfoWorld
2 months ago

Microsoft signals shift from OpenAI with launch of first in-house AI models for Copilot

According to Microsoft, MAI-1-preview uses an in-house mixture-of-experts model that was pre-trained and post-trained on 15,000 Nvidia H100 GPUs, a more modest infrastructure than the 100,000 H100 cluster sizes reportedly used for model development by some rivals. However, with an eye to ramping up performance, Microsoft AI is now running MAI-1-preview on Nvidia's more powerful GB200 cluster, the company said.
Artificial intelligence
Scala
fromHackernoon
9 months ago

SUTRA: Decoupling Concept & Language for Multilingual LLM Excellence | HackerNoon

SUTRA is a multilingual LLM that excels in understanding and generating text efficiently across 50+ languages.
fromThegreenplace
7 months ago

Sparsely-gated Mixture Of Experts (MoE)

The feed forward layer in transformer models is crucial for reasoning on token relationships, often housing most of the model's weights due to its larger dimensionality.
Marketing tech
[ Load more ]