#mixture-of-experts tag

5 days ago

Artificial intelligence

Qwen3.5 aims to position Alibaba alongside GPT and Claude

fromTheregister

Artificial intelligence

Mistral AI rolls out full suite of Apache-licensed models

5 days ago

Artificial intelligence

Qwen3.5 aims to position Alibaba alongside GPT and Claude

fromTheregister

Artificial intelligence

Mistral AI rolls out full suite of Apache-licensed models

more#open-source

fromHackernoon

1 week ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.

1 month ago

DeepSeek to release V4 AI model with powerful coding capabilities in February

DeepSeek's V4 shows superior coding performance and long-context processing, leveraging MoE architecture and sparse attention for efficiency and lower training cost.

fromTheregister

Nvidia pledges more openness as it slurps up Slurm

Nvidia acquired SchedMD, committed to keep Slurm open-source and vendor-neutral, and launched Nemotron 3 MoE AI models spanning 30B to 500B parameters.

Nvidia runs MoE models ten times faster

Nvidia's GB200 NVL72 server platform accelerates modern mixture-of-experts (MoE) models up to ten times versus previous-generation systems.

fromInfoQ

3 months ago

New IBM Granite 4 Models to Reduce AI Costs with Inference-Efficient Hybrid Mamba-2 Architecture

IBM attributes those improved characteristics vs. larger models to its hybrid architecture that combines a small amount of standard transformer-style attention layers with a majority of Mamba layers-more specifically, Mamba-2. With 9 Mamba blocks per 1 Transformer block, Granite gets linear scaling vs. context length for the Mamba parts (vs. quadratic scaling in transformers), plus local contextual dependencies from transformer attention (important for in-context learning or few-shots prompting).

Artificial intelligence

fromInfoQ

3 months ago

Kimi's K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer

Kimi K2 is a Mixture-of-Experts LLM (32B activated, 1.04T total) trained on 15.5T tokens using MuonClip to improve training stability.

fromInfoQ

5 months ago

xAI Releases Grok Code Fast 1, a New Model for Agentic Coding

grok-code-fast-1 is an agentic coding model optimized for tool usage, high throughput, long context, and seamless integration with developer workflows.