PyTorch team unveils framework for programming clusters
Briefly

PyTorch team unveils framework for programming clusters
"The PyTorch team at Meta, stewards of the PyTorch open source machine learning framework, has unveiled Monarch, a distributed programming framework intended to bring the simplicity of PyTorch to entire clusters. Monarch pairs a Python-based front end, supporting integration with existing code and libraries such as PyTorch, and a Rust-based back end, which facilitates performance, scalability, and robustness, the team said. ."
"Monarch organizes processes, actors, and hosts into a scalable multidimensional array, or mesh, that can be manipulated directly. Users can operate on entire meshes, or slices of them, with simple APIs, with Monarch handling distribution and vectorization automatically. Developers can write code as if nothing fails, according to the PyTorch team. But when something does fail, Monarch fails fast by stopping the whole program. Later on, users can add fine-grained fault handling where needed, catching and recovering from failures."
Monarch is a distributed programming framework that brings PyTorch-like simplicity to entire clusters. It combines a Python-based front end for integration with existing code and libraries and a Rust-based back end for performance, scalability, and robustness. Monarch uses scalable actor messaging to enable programming distributed systems as if on a single machine, hiding distributed complexity. Processes, actors, and hosts are organized into a multidimensional mesh that can be manipulated directly, and users can operate on whole meshes or slices with simple APIs while Monarch handles distribution and vectorization. Monarch defaults to fail-fast behavior and allows optional fine-grained fault handling. Monarch is currently experimental and installable via meta-pytorch.org.
Read at InfoWorld
Unable to calculate read time
[
|
]