
Flaky tests used to be caused by race conditions, timeouts, and reruns, but flakiness has become a structural issue as enterprises move from deterministic applications to agentic AI. Traditional CI/CD relies on binary assertions like X equals Y, while AI agents produce Y-like answers that can vary across runs while remaining defensible. Re-running the same agent can yield different results, so scenario-based test suites may mark valid outcomes as failures. DevOps teams must build agents and recreate the entire pipeline. Agentic AI receives a target state, senses surroundings via telemetry and APIs, reasons about actions, executes them, observes outcomes, and repeats until the target state is achieved or humans intervene. In self-healing CI/CD, agentic tools use historical pipeline data such as build time, flakiness rates, and resource usage patterns to predict impending failures before builds break, using correlations rather than deterministic rules.
"Old CI/CD systems rely on binary assertions: Assert X == Y. But with AI agents, the output isn't Y; it's Y-like answers. Run the same agent again, and it will likely produce two defensible but varying results. So, the test suite built on a scenario that no longer exists, calls this a failure."
"Agentic AI is an automated system capable of receiving a target state, sensing its surroundings using telemetry and APIs, reasoning about the actions it should perform to meet the target state, executing those actions, observing the outcome, and repeating the process until either the target state is achieved or human intervention is required."
"Using historical data from the pipeline, like build time, flakiness percentages, and patterns of resource usage, agentic tools highlight potential risks even before a commit triggers a build. When a microservice has been found to exhibit increasing latency at the p99 level through three successive deploys, but testing coverage for that service has diminished, the agent identifies that as a likely path to failure."
"It's not a deterministic process; rather, an inference based on correlations observed within the stack. This enables teams to take a proactive approach to potential issues. This is an entirely different form of engineering effort, one that accumulat"
Read at DevOps.com
Unable to calculate read time
Collection
[
|
...
]