
"Cybersecurity researchers have uncovered a chain of critical remote code execution (RCE) vulnerabilities in major AI inference server frameworks, including those from Meta, Nvidia, Microsoft, and open-source projects such as vLLM and SGLang. According to Oligo Security, these vulnerabilities stand out for the way they propagated. Developers copied code containing insecure patterns across projects, effectively transplanting the same flaw into multiple ecosystems."
""If you've worked with Python, you know pickle isn't designed for security," Lumelsky said. " It can execute arbitrary code during deserialization, which is fine in a tightly controlled environment, but far from fine if exposed over the network." From Meta, the same insecure pattern appeared in other frameworks, including Nvidia's TensorRT-LLM, vLLM, SGLang, and even the Modular Max Server."
"In their investigation, Oligo's researchers found that the initial trigger was exposed in Meta's Llama Stack, where a function used ZeroMQ's "recv-pyobj()" to receive data and then pass it directly to Python's "pickle.loads()." This allowed arbitrary code execution over unauthenticated sockets."
A chain of critical remote code execution vulnerabilities affects major AI inference server frameworks, including Meta Llama Stack, Nvidia TensorRT-LLM, Microsoft, vLLM, and SGLang. The root cause is the unsafe combination of ZeroMQ's recv-pyobj and Python's pickle.loads used over unauthenticated sockets, enabling arbitrary code execution. Code files containing this insecure pattern were copied between projects, transplanting the flaw across ecosystems. The replication of insecure code creates a systemic security gap in the inference ecosystem and exposes enterprise AI stacks to supply-chain and operational risks.
Read at InfoWorld
Unable to calculate read time
Collection
[
|
...
]