#mixture-of-experts

[ follow ]
Artificial intelligence
fromTechCrunch
4 days ago

Indian AI lab Sarvam's new models are a major bet on the viability of open-source AI | TechCrunch

Sarvam launched 30B and 105B mixture-of-experts LLMs with long context windows and speech and vision models, trained from scratch for Indian languages and real-time use.
#open-source
Artificial intelligence
fromHackernoon
1 week ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.
Artificial intelligence
fromTechzine Global
1 month ago

DeepSeek to release V4 AI model with powerful coding capabilities in February

DeepSeek's V4 shows superior coding performance and long-context processing, leveraging MoE architecture and sparse attention for efficiency and lower training cost.
Artificial intelligence
fromTheregister
2 months ago

Nvidia pledges more openness as it slurps up Slurm

Nvidia acquired SchedMD, committed to keep Slurm open-source and vendor-neutral, and launched Nemotron 3 MoE AI models spanning 30B to 500B parameters.
fromInfoQ
3 months ago

New IBM Granite 4 Models to Reduce AI Costs with Inference-Efficient Hybrid Mamba-2 Architecture

IBM attributes those improved characteristics vs. larger models to its hybrid architecture that combines a small amount of standard transformer-style attention layers with a majority of Mamba layers-more specifically, Mamba-2. With 9 Mamba blocks per 1 Transformer block, Granite gets linear scaling vs. context length for the Mamba parts (vs. quadratic scaling in transformers), plus local contextual dependencies from transformer attention (important for in-context learning or few-shots prompting).
Artificial intelligence
Artificial intelligence
fromInfoQ
3 months ago

Kimi's K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer

Kimi K2 is a Mixture-of-Experts LLM (32B activated, 1.04T total) trained on 15.5T tokens using MuonClip to improve training stability.
Artificial intelligence
fromInfoQ
5 months ago

xAI Releases Grok Code Fast 1, a New Model for Agentic Coding

grok-code-fast-1 is an agentic coding model optimized for tool usage, high throughput, long context, and seamless integration with developer workflows.
Artificial intelligence
fromMedium
5 months ago

Microsoft AI Unveils MAI-Voice-1 and MAI-1-Preview to Power the Next Generation of AI

Microsoft released MAI-Voice-1 and MAI-1-preview to deliver high-speed expressive speech and an internally built foundation model for improved instruction following and text responses.
fromInfoWorld
5 months ago

Microsoft signals shift from OpenAI with launch of first in-house AI models for Copilot

According to Microsoft, MAI-1-preview uses an in-house mixture-of-experts model that was pre-trained and post-trained on 15,000 Nvidia H100 GPUs, a more modest infrastructure than the 100,000 H100 cluster sizes reportedly used for model development by some rivals. However, with an eye to ramping up performance, Microsoft AI is now running MAI-1-preview on Nvidia's more powerful GB200 cluster, the company said.
Artificial intelligence
Scala
fromHackernoon
1 year ago

SUTRA: Decoupling Concept & Language for Multilingual LLM Excellence | HackerNoon

SUTRA is a multilingual LLM that excels in understanding and generating text efficiently across 50+ languages.
[ Load more ]