
"Folks, as you likely know, the availability of the site and related infrastructure has not been good recently. The briefing mentions a trend of incidents characterized by a high blast radius and Gen-AI assisted changes. Amazon cites novel GenAI usage for which best practices and safeguards are not yet fully established as a contributing factor."
"Junior and mid-level engineers must now seek approval from seniors before implementing AI-assisted changes. Treadwell also announced short-term initiatives to limit future outages. Amazon described the availability investigation as part of normal business and says it is continuously striving for improvement."
"Earlier this month, Amazon's website and app were down for nearly 6 hours after an error in software code. Customers were unable to complete transactions and had no access to account information or product prices."
Amazon experienced multiple significant outages involving AI-assisted code changes, prompting leadership to implement stricter controls. A 6-hour outage earlier this month prevented customers from completing transactions and accessing account information. AWS also suffered a 13-hour outage in December involving the Kiro AI tool. Senior Vice President Dave Treadwell mandated that junior and mid-level engineers must now seek senior approval before implementing AI-assisted changes. Amazon attributes incidents to novel GenAI usage lacking established best practices and safeguards. The company announced short-term initiatives to prevent future outages. Staff shortages from recent layoffs have compounded operational challenges, with engineers managing increased incident loads.
#ai-coding-tools #outages-and-reliability #code-review-governance #genai-safeguards #operational-resilience
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]