#uptime

[ follow ]
#cloud-computing
DevOps
fromInfoWorld
1 day ago

When cloud giants neglect resilience

Cloud outages highlight reliability issues as providers prioritize cost-cutting over service stability, raising questions about acceptable levels of unreliability.
#microsoft-azure
Tech industry
fromTheregister
1 day ago

Users complain of UK Azure capacity problems

Microsoft Azure is experiencing severe capacity issues in the UK, limiting new deployments and pushing users to alternative regions like Sweden.
Tech industry
fromComputerWeekly.com
1 week ago

Azure customers up in arms over 'full' UK South region | Computer Weekly

Microsoft Azure is facing capacity issues in the UK South region, affecting virtual machine availability and customer migrations.
Tech industry
fromTheregister
1 day ago

Users complain of UK Azure capacity problems

Microsoft Azure is experiencing severe capacity issues in the UK, limiting new deployments and pushing users to alternative regions like Sweden.
Tech industry
fromComputerWeekly.com
1 week ago

Azure customers up in arms over 'full' UK South region | Computer Weekly

Microsoft Azure is facing capacity issues in the UK South region, affecting virtual machine availability and customer migrations.
#scale-computing
Software development
fromTechzine Global
2 days ago

Scale sets edge platform's software ever more free from hardware constraints

Scale Computing is reducing hardware requirements for its software, allowing more flexibility for partners and customers in choosing hardware platforms.
Scala
fromTechzine Global
3 days ago

New Scale Computing gets new Velocity Partner Program

Scale Computing revamps its partner program to address market changes and strengthen relationships with partners amid industry challenges.
Software development
fromTechzine Global
2 days ago

Scale sets edge platform's software ever more free from hardware constraints

Scale Computing is reducing hardware requirements for its software, allowing more flexibility for partners and customers in choosing hardware platforms.
Scala
fromTechzine Global
3 days ago

New Scale Computing gets new Velocity Partner Program

Scale Computing revamps its partner program to address market changes and strengthen relationships with partners amid industry challenges.
#aws
DevOps
fromInfoQ
1 day ago

AWS Announces General Availability of DevOps Agent for Automated Incident Investigation

AWS has launched DevOps Agent, an AI-powered assistant for troubleshooting and automating tasks in AWS environments.
fromAmazon Web Services
5 days ago
DevOps

Troubleshooting environment with AI analysis in AWS Elastic Beanstalk | Amazon Web Services

AWS Elastic Beanstalk simplifies web application deployment and scaling, now enhanced with AI Analysis for troubleshooting environment health issues.
DevOps
fromInfoQ
1 day ago

AWS Announces General Availability of DevOps Agent for Automated Incident Investigation

AWS has launched DevOps Agent, an AI-powered assistant for troubleshooting and automating tasks in AWS environments.
DevOps
fromAmazon Web Services
5 days ago

Troubleshooting environment with AI analysis in AWS Elastic Beanstalk | Amazon Web Services

AWS Elastic Beanstalk simplifies web application deployment and scaling, now enhanced with AI Analysis for troubleshooting environment health issues.
fromInfoQ
1 week ago

Latency: The Race to Zero...Are We There Yet?

In the fintech industry we can link latency directly to profit and money. If I have lower latency than the competition, I can get to the better deals, I can make the better deals.
Venture
Software development
fromMedium
6 days ago

Async Logging Is Not a Silver Bullet - What Actually Limits Performance

Async logging redistributes costs rather than reducing them, impacting performance in different ways depending on implementation.
#devops
DevOps
fromDevOps.com
3 days ago

FinOps Isn't Slowing You Down - It's Fixing Your Pipeline - DevOps.com

Cost visibility should be integrated into DevOps workflows to manage cloud efficiency effectively.
fromDevOps.com
2 months ago
Software development

Survey Surfaces Disconnect Between DevOps Metrics and Business KPIs - DevOps.com

DevOps teams monitor applications extensively but rarely translate performance improvements into business metrics or formal financial impact measurements.
DevOps
fromDevOps.com
3 days ago

FinOps Isn't Slowing You Down - It's Fixing Your Pipeline - DevOps.com

Cost visibility should be integrated into DevOps workflows to manage cloud efficiency effectively.
DevOps
fromBusiness Matters
6 days ago

The Role of Dedicated Servers in Scaling Modern Businesses

Infrastructure investment is crucial for SMEs to ensure reliability, performance, and user experience in a competitive digital landscape.
DevOps
fromTechzine Global
5 days ago

Cloudflare introduces new features for building and deploying agents

Cloudflare is transforming AI development with Dynamic Workers, Sandboxes, and Artifacts for secure, scalable, and efficient code execution.
#observability
DevOps
fromDevOps.com
1 week ago

Survey Surfaces Rising Tide of Investments in Observability - DevOps.com

A significant number of enterprise IT leaders plan to invest heavily in observability to enhance application performance and reliability.
DevOps
fromTechzine Global
2 weeks ago

Observability warehouses, the next structural evolution for telemetry

Observability is essential for real-time insights in cloud systems, helping to reduce downtime and improve performance.
Software development
fromInfoQ
2 months ago

From Alert Fatigue to Agent-Assisted Intelligent Observability

AI-driven, agentic observability reduces operational toil by integrating with existing monitoring, starting read-only, building trust, and automating low-risk repetitive tasks under clear guardrails.
Roam Research
fromDevOps.com
1 month ago

The Observability Bill is Coming Due - and AI Wrote Most of It - DevOps.com

Observability data has become unmanageable and expensive, requiring intelligent filtering and management solutions rather than unlimited storage expansion.
DevOps
fromNew Relic
2 weeks ago

What is observability? How observability can help you achieve your business goals.

Conventional monitoring fails to address unknown unknowns, while observability provides insights into complex systems and enhances incident response.
DevOps
fromDevOps.com
1 week ago

Survey Surfaces Rising Tide of Investments in Observability - DevOps.com

A significant number of enterprise IT leaders plan to invest heavily in observability to enhance application performance and reliability.
DevOps
fromTechzine Global
2 weeks ago

Observability warehouses, the next structural evolution for telemetry

Observability is essential for real-time insights in cloud systems, helping to reduce downtime and improve performance.
DevOps
fromMedium
6 days ago

Set it up once, test it properly, and let the system handle the rest.

Automating SSL certificate renewal prevents production outages and reduces stress during incidents.
DevOps
fromAzure DevOps Blog
4 days ago

April Patches for Azure DevOps Server - Azure DevOps Blog

Customers should update to the latest version of Azure DevOps Server for security and reliability.
DevOps
fromInfoQ
5 days ago

Beyond One-Click: Designing an Enterprise-Grade Observability Extension for Docker

Docker Extensions enhance developer productivity but may not meet enterprise needs for security, compliance, and integration.
Web development
fromNew Relic
1 month ago

A Blueprint for Full-Stack Service Level Management

Effective system monitoring requires measuring user perception across three layers: experience perception, edge infrastructure control, and service business logic, each with distinct SLIs and SLOs.
Web frameworks
fromMedium
1 month ago

Why Most Spring Boot Apps Fail in Production (7 Critical Mistakes)

Spring Boot production failures stem from seven critical mistakes including improper dependency injection, configuration errors, and resource management issues that developers can systematically avoid.
#network-monitoring
DevOps
fromNew Relic
1 week ago

6 Network Monitoring Best Practices For Clarity in Distributed Systems

Effective network monitoring prioritizes understanding impact and taking action quickly over merely collecting metrics.
DevOps
fromNew Relic
1 week ago

How to Choose Network Monitoring Tools You Can Act On

Network monitoring requires context to effectively connect network behavior to applications and services for timely decision-making during incidents.
DevOps
fromNew Relic
1 week ago

6 Network Monitoring Best Practices For Clarity in Distributed Systems

Effective network monitoring prioritizes understanding impact and taking action quickly over merely collecting metrics.
DevOps
fromNew Relic
1 week ago

How to Choose Network Monitoring Tools You Can Act On

Network monitoring requires context to effectively connect network behavior to applications and services for timely decision-making during incidents.
Artificial intelligence
fromTheregister
1 month ago

Your datacenter's power architecture called. It's not happy

Accelerated computing demands exceed legacy datacenter power architectures, forcing migration from 48V to high-voltage DC systems to handle extreme power densities and current requirements.
#cloud-monitoring
fromNew Relic
1 week ago
DevOps

Cloud Monitoring Best Practices For Reliable, Unified Observability

Effective cloud monitoring focuses on unifying telemetry and providing context for engineers to make informed decisions.
DevOps
fromNew Relic
3 weeks ago

Cloud Monitoring Tools: 5 Best Platforms to Evaluate in 2026

Effective cloud monitoring focuses on real-time telemetry correlation to understand failures, not just data collection.
DevOps
fromNew Relic
1 week ago

Cloud Monitoring Best Practices For Reliable, Unified Observability

Effective cloud monitoring focuses on unifying telemetry and providing context for engineers to make informed decisions.
DevOps
fromNew Relic
3 weeks ago

Cloud Monitoring Tools: 5 Best Platforms to Evaluate in 2026

Effective cloud monitoring focuses on real-time telemetry correlation to understand failures, not just data collection.
Information security
fromComputerworld
1 month ago

Storage vendor offers a real guarantee - but check out those fine-print exceptions

Tech vendors frequently offer performance guarantees with substantial financial penalties, but hidden exceptions in EULAs often make claims difficult or impossible to collect.
DevOps
fromNew Relic
2 weeks ago

Exploring application performance monitoring (APM)

Application performance monitoring (APM) is essential for businesses to ensure optimal user experiences and maintain application performance in a complex digital landscape.
DevOps
fromInfoQ
2 weeks ago

Replacing Database Sequences at Scale Without Breaking 100+ Services

Validating requirements can simplify complex problems, and embedding sequence generation reduces network calls, enhancing performance and reliability.
DevOps
fromMedium
2 weeks ago

Fair Multitenancy-Beyond Simple Rate Limiting

Fair multitenancy ensures equitable infrastructure access for customers, balancing simplicity, performance, and safety in shared environments.
#kubernetes
fromMedium
2 weeks ago
DevOps

Understanding Kubernetes Architecture is a MUST

Understanding Kubernetes architecture is essential for effective cloud-native deployment and troubleshooting.
DevOps
fromInfoQ
2 weeks ago

Kubernetes Autoscaling Demands New Observability Focus Beyond Vendor Tooling

Kubernetes autoscalers like Karpenter require new observability practices focusing on provisioning behavior, scheduling latency, and cost efficiency.
DevOps
fromMedium
2 weeks ago

Understanding Kubernetes Architecture is a MUST

Understanding Kubernetes architecture is essential for effective cloud-native deployment and troubleshooting.
DevOps
fromInfoQ
2 weeks ago

Kubernetes Autoscaling Demands New Observability Focus Beyond Vendor Tooling

Kubernetes autoscalers like Karpenter require new observability practices focusing on provisioning behavior, scheduling latency, and cost efficiency.
Miscellaneous
fromDevOps.com
1 month ago

I Learned Traffic Optimization Before I Learned Cloud Computing. It Turns Out the Lessons Were the Same. - DevOps.com

Cloud infrastructure requires understanding system behavior and costs to operate effectively at speed, similar to how skilled drivers anticipate conditions rather than simply driving fast.
#distributed-systems
fromInfoQ
1 month ago
Software development

How a Small Enablement Team Supported Adopting a Single Environment for Distributed Testing

fromInfoQ
1 month ago
Software development

How a Small Enablement Team Supported Adopting a Single Environment for Distributed Testing

Information security
fromThe Hacker News
2 months ago

DevOps & SaaS Downtime: The High (and Hidden) Costs for Cloud-First Businesses

Relying solely on public cloud and DevOps SaaS platforms increases operational risk as outages, attacks, and Shared Responsibility gaps drive rising downtime and service degradation.
Miscellaneous
fromTheregister
2 months ago

UK users say Oracle Cloud Infrastructure wobbled last week

Oracle Cloud Infrastructure experienced a London-region outage; users reported Fusion application disruptions while Oracle provided no public comment.
#azure-outage
fromDevOps.com
1 month ago

What to do About AI's Forced Rethink of Reliability in Modern DevOps - DevOps.com

For years, reliability discussions have focused on uptime and whether a service met its internal SLO. However, as systems become more distributed, reliant on complex internet stacks, and integrated with AI, this binary perspective is no longer sufficient. Reliability now encompasses digital experience, speed, and business impact. For the second year in a row, The SRE Report highlights this shift.
Software development
fromInfoWorld
2 months ago

The private cloud returns, for AI workloads

A North American manufacturer spent most of 2024 and early 2025 doing what many innovative enterprises did: aggressively standardizing on the public cloud by using data lakes, analytics, CI/CD, and even a good chunk of ERP integration. The board liked the narrative because it sounded like simplification, and simplification sounded like savings. Then generative AI arrived, not as a lab toy but as a mandate. "Put copilots everywhere," leadership said. "Start with maintenance, then procurement, then the call center, then engineering change orders."
Artificial intelligence
fromNew Relic
3 months ago

Traditional Network Monitoring is Failing

For any IT department, these four words are the beginning of a familiar, often frustrating, journey. In our modern world, where business success is built on distributed applications and hybrid cloud architectures, the network is the circulatory system. When it fails, everything grinds to a halt. Yet, despite its critical importance, it often remains a black box-a source of blame that is difficult to prove or disprove.
Information security
DevOps
fromInfoWorld
3 weeks ago

Rethinking VM data protection in cloud-native environments

KubeVirt enables Kubernetes to manage both VMs and containers, requiring new strategies for VM lifecycle management and data protection.
Tech industry
fromTheregister
2 months ago

IT team fixed faults faster than outsourcer could find them

An 8-CPU Sun server with removable CPU cards suffered frequent CPU-card failures and slow contracted support, forcing local IT to swap cards to restore service.
DevOps
fromNew Relic
3 weeks ago

How to Use APM Metrics to Optimize Application Performance

Infrastructure metrics are crucial indicators of application performance and user experience.
fromTheregister
1 month ago

Server crashes traced to one very literal knee-jerk reaction

It was the time of Novell networks, RG58 cables, and bulky tower PCs. It was also a time before the telemarketer's IT department employed specialists. Carter and his two colleagues - boss Mike and part-time student Stefan - therefore handled tasks ranging from programming to support, and everything in between.
Software development
Artificial intelligence
fromInfoWorld
2 months ago

Five MCP servers to rule the cloud

Major cloud providers now offer official MCP servers that let AI agents automate cloud operations using existing cloud credentials and natural language commands.
Information security
fromThe Hacker News
2 months ago

When Cloud Outages Ripple Across the Internet

Cloud infrastructure outages can disable identity authentication and authorization, creating hidden single points of failure that cause broad operational and security impacts.
DevOps
fromNew Relic
3 weeks ago

Comparing The Best AIOps Tools for Faster, More Reliable IT Ops

IBM watsonx Orchestrate enhances incident detection and automation for enterprises in hybrid and multi-cloud environments using AI and machine learning.
DevOps
fromInfoWorld
3 weeks ago

Designing self-healing microservices with recovery-aware redrive frameworks

A recovery-aware redrive framework prevents retry storms while ensuring all failed requests are eventually processed in complex service systems.
Software development
fromDbmaestro
4 years ago

If You Don't Have Database Delivery Automation, Brace Yourself for These 10 Problems |

Manual database processes break DevOps pipelines; only 12% deploy database changes daily, causing configuration drift, frequent errors, slower time-to-market, and reduced productivity.
fromTechRepublic
2 months ago

What Are the Pros and Cons of Data Centers?

When ChatGPT launched in late 2022, I watched something remarkable happen. Within two months, it hit 100 million users, a growth rate that sent shockwaves through Silicon Valley. Today, it has over 800 million weekly active users. That launch sparked an explosion in AI development that has fundamentally changed how we build and operate the infrastructure powering our digital world.
Artificial intelligence
Information security
fromTheregister
2 months ago

Techie's one ring brought darkness by shorting a server

A technician wearing a wedding ring shorted a server board, causing an outage, briefly concealed the failure, and service resumed after an unexpected reboot.
DevOps
fromInfoQ
4 weeks ago

Configuration as a Control Plane: Designing for Safety and Reliability at Scale

Configuration in cloud-native systems is a dynamic control plane that directly influences system behavior and reliability at runtime.
fromDbmaestro
4 years ago

What is Database Delivery Automation and Why Do You Need It?

Manual database deployment means longer release times. Database specialists have to spend several working days prior to release writing and testing scripts which in itself leads to prolonged deployment cycles and less time for testing. As a result, applications are not released on time and customers are not receiving the latest updates and bug fixes. Manual work inevitably results in errors, which cause problems and bottlenecks.
Software development
Information security
fromDevOps.com
2 months ago

Secure DevOps at Scale: Integrating SRE, DevSecOps and Compliance - DevOps.com

Integrate security into DevOps and SRE to automate compliance and resilience within cloud-native SaaS pipelines from the start.
DevOps
fromLondon Business News | Londonlovesbusiness.com
1 month ago

Signs it's time to move to dedicated server hosting - London Business News | Londonlovesbusiness.com

Dedicated server hosting becomes necessary when traffic surges cause performance degradation, complex database operations require absolute resource isolation, and security demands exceed virtual environment capabilities.
Software development
fromTheregister
2 months ago

GitHub appears to be struggling with one nine availability

GitHub experienced repeated outages and severe instability, including notification delays and Copilot failures, with uptime falling below 90% at one point in 2025.
Tech industry
fromUnited States Edition
1 month ago

Spotlight report: Accelerating Data Center Modernization

Data center modernization is critical for AI deployment, requiring integrated infrastructure solutions across servers, storage, networking, and security.
Information security
fromBusiness Matters
1 month ago

Detecting Configuration Drift: Continuous Controls vs. Point-in-Time Snapshots

Continuous controls monitoring (CCM) is required to detect and remediate configuration drift in rapidly changing cloud environments before risks persist unnoticed.
DevOps
fromComputerWeekly.com
1 month ago

Everpure's Evergreen One for AI brings Exa flash and GPU-based service-level agreements | Computer Weekly

Everpure launches Evergreen One for AI, a consumption model with GPU-count-based SLAs for FlashBlade//Exa storage to optimize AI workload performance.
fromDevOps.com
1 month ago

Zero Downtime Multicloud Migrations for Observability Control Planes - DevOps.com

An observability control plane isn't just a dashboard. It's the operational authority system. It defines alert rules, routing, ownership, escalation policy, and notification endpoints. When that layer is wrong, the impact is immediate. The wrong team gets paged. The right team never hears about the incident. Your service level indicators look clean while production burns.
DevOps
DevOps
fromNew Relic
1 month ago

Guide to Alerts, Incident Management, and Observability

Alert fatigue from excessive telemetry requires a structured Alert Lifecycle Reference Architecture with three domains—Knowledge, Action, and Record—to align process architecture with technology architecture.
Software development
fromInfoQ
1 month ago

Kubernetes Introduces Node Readiness Controller to Improve Pod Scheduling Reliability

Kubernetes introduces the Node Readiness Controller to improve scheduling accuracy by synchronizing the API server's node readiness view with actual kubelet health signals, reducing pod scheduling onto unavailable nodes.
DevOps
fromInfoQ
1 month ago

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture

Uber redesigned MySQL infrastructure using Group Replication to reduce failover time from minutes to seconds while maintaining strong consistency across thousands of clusters.
DevOps
fromNew Relic
1 month ago

eBPF Network Metrics for Kernel-Level Observability | New Relic

New Relic's eBPF-based agent unifies network performance, APM telemetry, infrastructure metrics, and logging into a single lightweight solution, eliminating network blind spots and reducing mean time to innocence during incidents.
DevOps
fromInfoQ
1 month ago

Change as Metrics: Measuring System Reliability Through Change Delivery Signals

System changes cause 60-80% of production incidents, making change-related metrics essential first-class reliability signals aligned with DORA framework principles.
DevOps
fromDevOps.com
1 month ago

On-Call Rotation Best Practices: Reducing Burnout and Improving Response - DevOps.com

On-call duty is critical for system protection but often mismanaged, causing engineer burnout and attrition when rotations are poorly designed, alerts are excessive, and automation is lacking.
fromDevOps.com
1 month ago

Harness Readies Resilience Testing Platform to Make Applications More Robust - DevOps.com

The Harness Resilience Testing platform extends the scope of the tests provided to include application load and disaster recovery (DR) testing tools that will enable DevOps teams to further streamline workflows.
DevOps
DevOps
fromNew Relic
1 month ago

Workflow Automation: Turn Observability Into Action

Workflow Automation reduces mean time to recovery from hours to minutes by automatically detecting deployment anomalies and executing rollbacks with minimal human intervention.
fromDbmaestro
5 years ago

Database Delivery Automation in the Multi-Cloud World

The main advantage of going the Multi-Cloud way is that organizations can "put their eggs in different baskets" and be more versatile in their approach to how they do things. For example, they can mix it up and opt for a cloud-based Platform-as-a-Service (PaaS) solution when it comes to the database, while going the Software-as-a-Service (SaaS) route for their application endeavors.
DevOps
fromInfoWorld
2 months ago

The 'Super Bowl' standard: Architecting distributed systems for massive concurrency

When I manage infrastructure for major events (whether it is the Olympics, a Premier League match or a season finale) I am dealing with a "thundering herd" problem that few systems ever face. Millions of users log in, browse and hit "play" within the same three-minute window. But this challenge isn't unique to media. It is the same nightmare that keeps e-commerce CTOs awake before Black Friday or financial systems architects up during a market crash. The fundamental problem is always the same: How do you survive when demand exceeds capacity by an order of magnitude?
DevOps
fromNew Relic
2 months ago

5 Best Application Performance Monitoring Tools to Consider in 2026

Support for distributed systems. Check how well the tool handles microservices, serverless, and Kubernetes. Can you follow a request across services, queues, and third-party APIs? Does it understand pods, nodes, clusters, and autoscaling events, or does it treat everything like a static host? Correlation across metrics, logs, and traces. In an incident, you shouldn't be copying IDs between tools. Look for the ability to pivot directly from a slow trace to relevant logs,
DevOps
[ Load more ]