AMD introduces Instinct MI350P for drop-in enterprise AI hardware
Briefly

AMD introduces Instinct MI350P for drop-in enterprise AI hardware
"AMD is introducing the Instinct MI350P PCIe card. It is designed to make it easier for organizations to run AI workloads locally in existing data centers without requiring major changes to power supply, cooling, or rack infrastructure. According to AMD, the new hardware targets organizations that need additional AI computing power but do not want to immediately invest in specialized GPU platforms, which often require modifications to data centers."
"The MI350P PCIe card is designed for standard air-cooled servers and existing racks. According to AMD, systems can be equipped with up to eight cards for inference workloads and retrieval augmented generation (RAG) applications using small, medium, and larger AI models. AMD claims performance of up to 2,299 teraflops and peak values of up to 4,600 TFLOPS when using the MXFP4 precision format."
"Additionally, the company states that the card features 144 GB of HBM3E memory with a memory bandwidth of up to 4 TB/s. The hardware supports multiple AI precision formats, including FP8, MXFP8, MXFP4, INT8, and BF16. AMD also utilizes sparsity support to execute certain AI operations more efficiently and increase workload throughput."
"In the announcement, AMD emphasizes software and interoperability. The cards support, among other things, Kubernetes GPU Operator, AMD Inference Microservices, and AI frameworks such as PyTorch. According to AMD, this should enable organiz"
Instinct MI350P PCIe is designed to help organizations run AI workloads locally in existing data centers without major changes to power supply, cooling, or rack infrastructure. The card targets organizations needing additional AI compute but not wanting to invest immediately in specialized GPU platforms that often require data-center modifications. It is positioned as an intermediate step between traditional server hardware and large-scale AI infrastructure for cost control, compliance, and data privacy. The card is intended for standard air-cooled servers and existing racks, supporting up to eight cards for inference and retrieval augmented generation workloads. It provides up to 2,299 teraflops and up to 4,600 TFLOPS with MXFP4, includes 144 GB of HBM3E memory with up to 4 TB/s bandwidth, supports multiple precision formats, and uses sparsity for more efficient execution. It also emphasizes software interoperability with Kubernetes GPU Operator, AMD Inference Microservices, and PyTorch.
Read at Techzine Global
Unable to calculate read time
[
|
]