#system-reliability

[ follow ]
DevOps
fromInfoWorld
1 hour ago

Cloud-based LLMs risk enterprise stability

Enterprises must return to architectural resilience principles when adopting cloud-hosted LLMs to mitigate risks from increasingly common outages that cause widespread business disruption.
fromTheregister
4 days ago

Smart mirror shows dumb Windows in elevator

The Windows Boot Manager has blamed a recent hardware or software change, which, frankly, could be pretty much anything. The code 0xc0000428 is a clue that something might be awry with the digital signature of a file (perhaps ntoskrnl.exe) and, to be honest, we'd suggest nuking the whole thing from orbit.
Gadgets
Software development
fromTheregister
1 week ago

Firefox finds a slew of new bugs with Claude's help

Approximately 10-15 percent of Firefox browser crashes result from bit flips caused by faulty hardware rather than software errors, affecting hundreds of thousands of users monthly.
Software development
fromInfoWorld
2 weeks ago

The reliability cost of default timeouts

Unbounded waiting in distributed systems causes slowness to manifest as outages before traditional failure detection triggers, draining capacity and degrading user experience.
fromFortune
2 months ago

Air traffic still runs on floppy discs in places, so the FAA just picked 2 companies for a $26 billion radar overhaul | Fortune

The federal government has picked two companies to replace 612 radar systems nationwide that date back to the 1980s as part of a multibillion-dollar overhaul of the nation's air traffic control system. Transportation Secretary Sean Duffy and the Federal Aviation Administration said Monday that contractors RTX and Spanish firm Indra will replace the radar systems by the summer of 2028.
US politics
San Francisco
fromsfist.com
6 months ago

BART Board Does Some Grilling, Managers Give Some Explanation About Last Week's Systemwide Meltdown

A fiber-optic cable move during upgrades triggered another BART systemwide meltdown, prompting management accountability and urgent mitigation across dozens more stations.
Software development
fromInfoQ
7 months ago

Grafana 12.1 Brings Built-in Diagnostics and Enhanced Alerting

Grafana 12.1 introduces features for system reliability, alert management, and dashboard interactivity, including Grafana Advisor and trendline transformations.
Privacy technologies
fromInfoQ
11 months ago

How Meta Uses Precision Time Protocol to Handle Leap Seconds

Leap seconds are critical for time-sensitive systems, requiring precise handling to prevent errors.
Meta's algorithmic method improves PTP synchronization by introducing a Window of Uncertainty for leap seconds.
[ Load more ]