
"a generational leap in efficiency and performance for AI inference workloads by delivering greater than 10x higher effective memory bandwidth and much lower power consumption."
"direct liquid cooling for thermal efficiency, PCIe for scale up, Ethernet for scale out, confidential computing for secure AI workloads, and a rack-level power consumption of 160 kW."
"offer rack-scale performance and superior memory capacity for fast generative AI inference at high performance per dollar per watt"
Qualcomm introduced AI200 and AI250 chip-based accelerator cards targeted at AI inference workloads. The AI200 supports 768 GB of LPDDR memory per card. The AI250 leverages an innovative memory architecture based on near-memory computing and promises a generational leap in efficiency and performance for inference by delivering much higher effective memory bandwidth and lower power consumption. Pre-configured rack systems will use direct liquid cooling, PCIe for scale-up, Ethernet for scale-out, confidential computing for secure workloads, and a rack-level power envelope of 160 kW. The design builds on Hexagon NPU technology and targets fast generative AI inference with strong performance-per-dollar-per-watt.
 Read at Theregister
Unable to calculate read time
 Collection 
[
|
 ... 
]