#model-optimization tag

fromComputerworld

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

fromComputerworld

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

Build DeepSeek-V3: Multi-Head Latent Attention (MLA) Architecture - PyImageSearch

Multi-Head Latent Attention (MLA) reduces computational and memory costs of traditional attention mechanisms by introducing a latent representation space while preserving contextual understanding.

Apple

3 months ago

All the brilliance of AI on minimalist platforms

On-device AI handles most processing locally, reducing dependence on massive data infrastructures while enabling efficient smartphone intelligence.

fromBusiness Insider

5 months ago

Anthropic CEO Dario Amodei drags OpenAI and Google: 'We don't have to do any code reds'

Anthropic focuses on enterprise AI—prioritizing coding, scientific, and business capabilities—avoiding consumer-focused 'code red' urgency facing OpenAI and Google.

fromInfoQ

5 months ago

SAM 3 Introduces a More Capable Segmentation Architecture for Modern Vision Workflows

SAM 3 improves segmentation accuracy, boundary quality, contextual coherence, robustness, and inference speed for reliable deployment across GPUs, mobile hardware, and web runtimes.

3 years ago

Using LLVM To Supercharge AI Model Execution On Edge Devices | HackerNoon

LLVM simplifies optimizing AI workloads for edge devices, transforming deployment pipelines into efficient processes.

56 years ago

Keep the Channel, Change the Filter: A Smarter Way to Fine-Tune AI Models | HackerNoon

Efficient fine-tuning of large pre-trained models can be achieved by adjusting only filter atoms while preserving overall model capabilities.

1 year ago

Chinese AI Model Promises Gemini 2.5 Pro-level Performance at One-fourth of the Cost | HackerNoon

MiniMax's M1 model stands out with its open-weight reasoning capabilities, scoring high on multiple benchmarks, including an impressive 86.0% accuracy on AIME 2024.

Artificial intelligence

10 months ago

Can Smaller AI Outperform the Giants? | HackerNoon

The advancement of vision-language models (VLMs) relies on foundational design choices, yet many lack justification, hindering progress by obscuring performance improvements.

Artificial intelligence

Growth hacking

fromInfoQ

11 months ago

Scaling Large Language Model Serving Infrastructure at Meta

LLM serving is evolving into a foundational technology similar to an operating system.

1 year ago

All the brilliance of AI on minimalist platforms

Fast forward to 2024, our reliance on massive data infrastructures is evaporating, with AI systems running on palm-sized devices. Apple & Qualcomm chips integrate AI for tasks like language translation and photo processing.

Digital life

fromTechzine Global

11 months ago

Microsoft makes Azure AI Foundry available with improved model tools

Azure AI Foundry is now generally available, enhancing tools for AI model selection, customization, and deployment for developers.