Argonne flexes spare supercompute to build private AI inference service

An AI inference service has been built using spare supercomputing capacity at Argonne National Laboratory. The service aims to support researchers across the US, including DoE labs and teams working on the Genesis Mission, to advance discovery across multiple fields. It currently runs on two clusters: Sophia with 192 Nvidia A100 GPUs and Metis with 32 SambaNova SN40L AI accelerators. Future expansion will add Nvidia GH200-based Tara and Nvidia B200-based Minerva systems. Researchers access large language models through a chatbot-like portal, including GPT-OSS, Gemma, Llama, and domain-specific or custom models such as AuroraGPT. The service supports secure analysis of large datasets and experimentation with integrating generative AI into workflows.

"Argonne is home to some of the world's largest supercomputing clusters, including the No. 3-ranked Aurora supercomputer. But its compute capacity also includes several smaller, AI-optimized systems. As of writing, the lab's inference service is running atop two clusters: The first is the Sophia system, comprising 192 Nvidia A100 GPUs, most with 40 GB of memory. The second, dubbed Metis, is arguably the more interesting. That system features 32 of SambaNova's SN40L AI accelerators."

"The inference service provides researchers with access to a range of large language models (LLMs) through a chatbot-like portal. Models include OpenAI's GPT-OSS, Google's Gemma family, Meta's Llama herd, and a variety of domain-specific and custom models, like AuroraGPT. And at least for some of its services, Argonne appears to be using Open WebUI, a popular self-hosted chatbot service we've explored on numerous occasions."

""By making AI inference available as a shared resource, we are enabling researchers to apply AI at scale to their data, their simulations and their experiments without having to build and maintain their own infrastructure," ALCF director Michael Papka said in a statement. Critically, the service enables DoE researchers to experiment with chatbots in a secure manner that does"

#ai-inference #large-language-models #supercomputing #scientific-research #secure-ai-workflows

Read at theregister

Unable to calculate read time

Collection

[

...

]

Argonne flexes spare supercompute to build private AI inference serviceArgonne flexes spare supercompute to build private AI inference service Briefly

Argonne flexes spare supercompute to build private AI inference service
Argonne flexes spare supercompute to build private AI inference service
Briefly