
"The researchers found that LLM-generated context files degrade performance, actually reducing the task success rate by an average of 3% compared to providing no context file at all. They also consistently increased the number of steps the agent took, driving up inference costs by over 20%."
"The team tested four agents (Claude 3.5 Sonnet, Codex GPT-5.2 and GPT-5.1 mini, and Qwen Code) across three distinct scenarios: using no context file, an LLM-generated file, and a human-written file. The researchers assessed the real-world impact of repository-level instructions by tracking three proxy indicators: task success rates, the number of agent steps, and overall inference costs."
"The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details, such as highly specific tooling or custom build commands."
ETH Zurich researchers conducted the first rigorous empirical study on whether AGENTS.md context files improve AI agent performance on coding tasks. Using AGENTbench, a dataset of 138 real-world Python tasks from niche repositories, they tested four AI agents across three scenarios: no context file, LLM-generated files, and human-written files. Results showed LLM-generated context files degraded performance, reducing task success rates by 3% and increasing inference costs by over 20% compared to no context. The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details like specific tooling or custom build commands.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]