#memory-efficiency
#memory-efficiency

[ follow ]

New IBM Granite 4 Models to Reduce AI Costs with Inference-Efficient Hybrid Mamba-2 Architecture

IBM attributes those improved characteristics vs. larger models to its hybrid architecture that combines a small amount of standard transformer-style attention layers with a majority of Mamba layers-more specifically, Mamba-2. With 9 Mamba blocks per 1 Transformer block, Granite gets linear scaling vs. context length for the Mamba parts (vs. quadratic scaling in transformers), plus local contextual dependencies from transformer attention (important for in-context learning or few-shots prompting).

Artificial intelligence

fromTechzine Global

2 months ago

IBM introduces Granite 4 language models

Granite 4 hybrid models combine Transformer and Mamba architectures to improve performance and reduce memory usage across models from 3B to 32B parameters.

fromSitepoint

4 months ago

Understanding Memory Page Sizes on Arm64 - SitePoint

Larger memory page sizes can impact memory efficiency and performance, potentially leading to less efficient memory use if pages are not fully utilized.

Software development

[ Load more ]

#memory-efficiency#memory-efficiency

New IBM Granite 4 Models to Reduce AI Costs with Inference-Efficient Hybrid Mamba-2 Architecture

IBM introduces Granite 4 language models

Understanding Memory Page Sizes on Arm64 - SitePoint

#memory-efficiency
#memory-efficiency