Meta Applies Mutation Testing with LLM to Improve Compliance Coverage
Briefly

Meta Applies Mutation Testing with LLM to Improve Compliance Coverage
"Meta has applied large language models to mutation testing to improve compliance coverage across its software systems. The approach integrates LLM-generated mutants and tests into Meta's Automated Compliance Hardening system (ACH), addressing scalability and accuracy limits of traditional mutation testing. The system is intended to keep products and services safe while meeting compliance obligations at scale, helping teams satisfy global regulatory requirements more efficiently."
"Mutation testing evaluates the effectiveness of test suites by introducing small, deliberate changes mutants into code and checking whether tests detect them. Traditional mutation testing has seen limited adoption due to excessive mutant counts, high computational costs, and the presence of equivalent mutants that add little value. Meta's approach uses LLMs to generate context-aware mutants and corresponding tests, reducing noise and focusing engineering effort on high-value code paths."
"Meta's ACH system uses LLMs to generate realistic mutants and targeted tests, focusing on privacy, safety, and regulatory concerns. An LLM-based equivalence detector filters redundant mutants, while a test generator produces unit tests that engineers can review rather than write manually, significantly reducing operational overhead. Early deployment across Facebook, Instagram, WhatsApp, and Meta's wearables platforms produced tens of thousands of mutants and hundreds of actionable tests."
Meta applied large language models (LLMs) to mutation testing to improve compliance coverage across software systems. The approach integrates LLM-generated mutants and tests into the Automated Compliance Hardening (ACH) system to address scalability and accuracy limitations of traditional mutation testing. LLMs generate context-aware mutants and corresponding unit tests, while an LLM-based equivalence detector filters redundant mutants. The test generator produces reviewable unit tests, reducing developer effort and operational overhead. Early deployment across Facebook, Instagram, WhatsApp, and wearables produced tens of thousands of mutants and hundreds of actionable tests. Findings were presented at FSE 2025 and EuroSTAR 2025.
Read at InfoQ
Unable to calculate read time
[
|
]