
"Perplexity AI has released an open-source software tool that solves two expensive problems for enterprises running AI systems: being locked into a single cloud provider and the need to buy the latest hardware to run massive models. The tool, called TransferEngine, enables large language models to communicate across different cloud providers' hardware at full speed. Companies can now run trillion-parameter models like DeepSeek V3 and Kimi K2 on older H100 and H200 GPU systems instead of waiting for expensive next-generation hardware,"
"That lock-in stems from a fundamental technical incompatibility, according to the research. Cloud providers use different networking protocols for high-speed GPU communication. Nvidia's ConnectX chips use one standard, while AWS's Elastic Fabric Adapter (AWS EFA) uses an entirely different proprietary protocol. Previous solutions worked on one system or the other, but not both, the paper noted. This forced companies to commit to a single provider's ecosystem, or accept dramatically slower performance."
"The problem is particularly acute with newer Mixture-of-Experts models, Perplexity found. DeepSeek V3 packs 671 billion parameters. Kimi K2 hits a full trillion. These models are too large to fit on single eight-GPU systems, according to the research. The obvious answer would be Nvidia's new GB200 systems, essentially one giant 72-GPU server. But those cost millions, face extreme supply shortages, and aren't available everywhere, the researchers noted."
TransferEngine bridges incompatible high-speed GPU networking protocols used by AWS and Nvidia, enabling cross-provider GPU-to-GPU communication at full speed. The tool allows large Mixture-of-Experts models, including DeepSeek V3 (671B) and Kimi K2 (1T), to run across older H100 and H200 GPU clusters by removing the need to adopt a single provider's networking standard. Previous solutions supported only one protocol, creating vendor lock-in or performance trade-offs. TransferEngine is open-source and offers an alternative to buying scarce, expensive GB200 multi-GPU servers while preserving high-performance distributed inference.
Read at InfoWorld
Unable to calculate read time
Collection
[
|
...
]