DevOps

[ follow ]
fromTheregister
1 day ago

Atlassian's DR simulation showed it lived in dependency hell

Australian collaborationware company Atlassian has revealed it's spent four years trying to reduce dangerous internal dependencies, and while it has rebuilt its PaaS, it still has issues - but thinks they're now manageable. As explained in a Tuesday post by Senior Engineering Manager Andrew Ross, "Atlassian runs a large service-based platform with thousands of different services, most deployed by our custom orchestration system, 'Micros'."
DevOps
#observability
fromInfoQ
1 day ago
DevOps

groundcover Takes Aim at Datadog with Observability Migration Tool

fromInfoQ
1 day ago
DevOps

groundcover Takes Aim at Datadog with Observability Migration Tool

#cloud-native
DevOps
fromDbmaestro
4 years ago

What is Database Delivery Automation and Why Do You Need It?

Automated database delivery reduces release time, minimizes errors, improves developer-DBA synchronization, and accelerates time-to-market while boosting developer productivity.
DevOps
fromDbmaestro
4 years ago

If You Don't Have Database Delivery Automation, Brace Yourself for These 10 Problems |

Manual database processes cause configuration drift, break DevOps pipelines, and prevent most organizations from deploying database changes daily.
DevOps
fromDbmaestro
4 years ago

Database DevOps - Where Do I Start? |

Databases must be integrated into DevOps pipelines to prevent bugs, production issues, delays, and compliance gaps while enabling automated, auditable deployments.
DevOps
fromZDNET
2 days ago

7 open-source apps I'd honestly pay for because they're that good

Some open-source applications provide such powerful, cross-platform functionality that users would willingly pay for them.
fromAmazon Web Services
4 days ago

Introducing AWS CloudFormation Stack Refactoring Console Experience: Reorganize Your Infrastructure Without Disruption | Amazon Web Services

AWS CloudFormation models and provisions cloud infrastructure as code, letting you manage entire lifecycle operations through declarative templates. Stack Refactoring console experience, announced today, extends the AWS CLI experience launched earlier. Now, you move resources between stacks, rename logical IDs, and decompose monolithic templates into focused components without touching the underlying infrastructure using the CloudFormation console. Your resources maintain stability and operational state throughout the reorganization.
DevOps
#aws-cloudformation
fromAmazon Web Services
1 month ago
DevOps

StackSets Deployment Strategies: Balancing Speed, Safety, and Scale to Optimize Deployments for Different Organizational Needs | Amazon Web Services

fromAmazon Web Services
1 month ago
DevOps

StackSets Deployment Strategies: Balancing Speed, Safety, and Scale to Optimize Deployments for Different Organizational Needs | Amazon Web Services

fromNew Relic
5 days ago

Unleashing the Power of Monitoring: Master Your WordPress with New Relic

WordPress powers countless websites across various domains, offering incredible versatility. This Content Management System (CMS) is the undisputed leader in the CMS market, powering an impressive 43.6% of all websites globally, according to these statistics. With over 810 million websites built on the platform and hundreds more launching daily (500+), its adoption continues to surge. This widespread use gives WordPress a massive 62% CMS market share, significantly outpacing its rivals.
DevOps
fromAmazon Web Services
5 days ago

Serverless strategies for streaming LLM responses | Amazon Web Services

Modern generative AI applications often need to stream large language model (LLM) outputs to users in real-time. Instead of waiting for a complete response, streaming delivers partial results as they become available, which significantly improves the user experience for chat interfaces and long-running AI tasks. This post compares three serverless approaches to handle Amazon Bedrock LLM streaming on Amazon Web Services (AWS), which helps you choose the best fit for your application.
DevOps
DevOps
fromTheregister
5 days ago

OCP learning how to get quantum computers into existing DCs

Datacenter operators must adapt infrastructure for co-locating quantum and classical HPC, addressing cryogenic cooling, environmental controls, magnetic shielding, weight, humidity, and scheduling.
DevOps
fromInfoQ
6 days ago

The Architecture of Developer Experience: Where Product, Platform, and Operations Meet

Platform architecture should minimize cognitive load, enable automation, and foster collaboration across product, platform, and operations to accelerate delivery in distributed systems.
fromTheregister
6 days ago

Systemd 259 release candidate flexes musl support

Note that systemd compiled with musl has various limitations: since NSS or equivalent functionality is not available, nss-systemd, nss-resolve, DynamicUser=, systemd-homed, systemd-userdbd, the foreign UID ID, unprivileged systemd-nspawn, systemd-nsresourced, and so on will not work. [...] Caveat emptor. What this means is that it's now possible to compile and run systemd on Linux distributions that are not based on the GNU version of the C standard library, glibc.
DevOps
#aws-outage
DevOps
fromArs Technica
6 days ago

Massive Cloudflare outage was triggered by file that suddenly doubled in size

A bad feature file exceeding a 200-feature runtime limit caused Cloudflare's bot management to fail, generating widespread 5xx errors and network instability.
fromAzure DevOps Blog
1 week ago

Azure DevOps and GitHub Repositories - Next Steps in the Path to Agentic AI - Azure DevOps Blog

With all these rapidly advancing new capabilities, now is the ideal moment to move your repositories to GitHub so your teams can fully harness Copilot's agentic power while still benefiting from your existing investments in Azure Boards and Pipelines. The two platforms continue to work better together, and choosing GitHub as the home for your code unlocks the richest end-to-end agentic experience.
DevOps
DevOps
fromSmashing Magazine
1 week ago

From Chaos To Clarity: Simplifying Server Management With AI And Automation - Smashing Magazine

AI-ready infrastructure and automation transform reactive server management into proactive operations that preserve performance, reduce firefighting, and improve user retention.
DevOps
fromInfoWorld
1 week ago

Aspire 13 bolsters Python, JavaScript support

aspire do breaks build, publish, and deploy workflows into discrete, parallelizable steps with dependency tracking and an MCP server that enables AI assistants to query running apps.
fromInfoQ
1 week ago

AWS Introduces Remote Build Cache in ECR to Accelerate Docker Image Builds

Amazon Web Services has announced enhancements to its CodeBuild service, allowing teams to use Amazon ECR as a remote Docker layer cache, significantly reducing image build times in CI/CD pipelines. By leveraging ECR repositories to persist and reuse build layers across runs, organisations can skip rebuilding unchanged parts of containers and accelerate delivery. The blog outlines how Docker BuildKit support enables commands such as --cache-from and --cache-to pointing to ECR-based cache images,
DevOps
#aws
DevOps
fromInfoQ
1 week ago

Cloud Security Challenges in the AI Era - How Running Containers and Inference Weaken Your System

Containers package code and its dependencies but do not provide strong isolation, requiring additional runtime and platform security measures for Kubernetes environments.
#kubernetes
fromTechzine Global
2 weeks ago
DevOps

Google introduces Agent Sandbox for Kubernetes

Google launches Agent Sandbox, an open-source Kubernetes primitive providing kernel-level isolation for scalable AI agents, with a Python SDK and GKE optimizations including Pod Snapshots.
fromMedium
1 month ago
DevOps

My Kubestronaut journey

Completed all CNCF Kubernetes certifications between Oct 2024 and Jan 2025, achieving high scores and earning Kubestronaut recognition.
DevOps
fromMedium
2 weeks ago

How to kill a process that is listening on a port in linux

Port conflicts happen when another process has bound the IP:port, preventing a service from listening; identify the process and terminate it.
DevOps
fromInfoQ
1 week ago

Crossplane Reaches Production Maturity by Graduating CNCF

Crossplane graduated from the Cloud Native Computing Foundation, becoming a production-hardened Kubernetes-native control plane for multi-cloud infrastructure and internal developer platforms.
DevOps
fromInfoQ
2 weeks ago

Google Cloud Introduces Chaos Engineering Framework and Recipes for Distributed Systems

Intentional chaos engineering in production, using steady-state hypotheses, realistic fault injection, automation, and blast-radius controls, is necessary to build resilient cloud applications.
DevOps
fromInfoQ
2 weeks ago

When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale

Reverse proxy layers are fragile; small operational details and context-dependent optimizations cause large outages, so profile, monitor, and prioritize human-operational simplicity.
#platform-engineering
fromInfoQ
2 weeks ago
DevOps

Building Resilient Platforms: Insights from Over Twenty Years in Mission-Critical Infrastructure

fromInfoQ
2 weeks ago
DevOps

Building Resilient Platforms: Insights from Over Twenty Years in Mission-Critical Infrastructure

#minikube
DevOps
fromChris Warrick
2 weeks ago

Distro Hopping, Server Edition

Ubuntu LTS provides longer support and greater stability than Fedora's frequent releases, reducing upgrade churn and compatibility issues.
DevOps
fromTheregister
2 weeks ago

Rideshare giant dumps 200 cloudy Macs, saves $2.4 million

Grab replaced over 200 cloud Mac Minis with on-premises physical Macs, expecting $2.4 million in savings across three years by avoiding costly cloud macOS billing.
fromIT Pro
2 weeks ago

Inside a cloud outage

"The worst feeling in the world is to be in the middle of an incident and realize that it would be a great thing that you could do to resolve that incident, if only a tool had been built before, right? So it'd be great if you figure that out before you get into that incident, and then you have the tool ready to go. "
DevOps
DevOps
fromInfoQ
2 weeks ago

Grafana and GitLab Introduce Serverless CI/CD Observability Integration

Serverless integration forwards GitLab CI/CD webhooks into Grafana Cloud Logs, enabling real-time correlation of deploy events with metrics.
DevOps
fromInfoWorld
2 weeks ago

Developers don't care about Kubernetes clusters

Provide developers ready environments instead of forcing them to learn Helm, Kustomize, or Kubernetes manifest internals, which wastes developer time and reduces productivity.
DevOps
fromAzure DevOps Blog
3 weeks ago

Azure Developer CLI: Azure Container Apps Dev-to-Prod Deployment with Layered Infrastructure - Azure DevOps Blog

Use azd publish and layered infrastructure in Azure Developer CLI v1.20.0 to build containers once and deploy them across multiple environments with separated concerns.
DevOps
fromNew Relic
3 weeks ago

Logs Intelligence: Shift the Burden, Stop the Crisis

Logs Intelligence shifts incident analysis from SREs to the platform, providing instant root-cause hypotheses and unified Logs+APM correlation to reduce MTTR.
fromInfoWorld
3 weeks ago

Google's new query builder to tackle SQL complexity in cloud workload monitoring

The query builder solves a large SQL bottleneck by transforming log analysis from a time-consuming task into a real-time, self-service capability that's fit for any DevOps or site reliability professional. This is an immense time-saver that can collapse typical investigation windows from hours to minutes,
DevOps
fromInfoQ
3 weeks ago

You Are Asking the Wrong Questions (About Reliability and SRE)

I'd like to beg you, dear Sir, as well as I can, to have patience with everything unresolved in your heart and to try to love the questions themselves as if they were locked rooms or books written in a very foreign language. Don't search for the answers, which could not be given to you now, because you would not be able to live them. The point is to live everything. Live the questions now. Perhaps then, someday far in the future, you will gradually, without even noticing it, live your way into the answer
DevOps
DevOps
fromTechzine Global
1 month ago

Perforce: Software development copilots are a shift-up

Copilot AI boosts individual developer productivity 50–70% but can widen quality, visibility, trust, and integration gaps across enterprises without improved processes and DevOps alignment.
#git-sync
fromMedium
1 month ago

Zero Trust with Cilium : Enforcing mTLS in Kubernetes

Kubernetes networking is highly flexible but this flexibility can introduce security risks because all pods can communicate with each other by default. Cilium addresses these challenges by providing a modern, high-performance solution for Kubernetes networking that combines security, observability and performance using eBPF. Cilium is an open-source networking and security solution designed for cloud-native environments. It provides high-performance pod-to-pod networking utilizing eBPF and allows identity-aware network policies at the API level, enforcing fine grained controls.
DevOps
DevOps
fromInfoQ
4 weeks ago

New Infrastructure-as-Code Tool "formae" Takes Aim at Terraform

formae is an open-source IaC platform that treats real cloud environments as versioned state, auto-discovers resources, and supports reconcile and patch modes.
DevOps
fromMedium
4 weeks ago

Replaying massive data in a non-production environment using Pekko Streams and Kubernetes Pekko...

Replaying real production traffic to non-production environments is essential for realistic testing and requires rate shaping, data sanitization, and robust mirroring infrastructure.
fromInfoQ
4 weeks ago

CNCF Highlights How vCluster Eases Kubernetes Multi-Tenancy Challenges

The Cloud Native Computing Foundation (CNCF) published a blog post discussing how vCluster, an open-source project by Loft Labs, addresses key multi-tenancy obstacles in Kubernetes clusters by enabling "virtual clusters" within a single host cluster. This approach enables multiple tenants to have isolated control planes while sharing underlying compute resources, thereby reducing overhead without compromising isolation. Traditional namespace-based isolation in Kubernetes often falls short when tenants need to deploy cluster-scoped resources like custom resource definitions (CRDs)
DevOps
DevOps
fromTechzine Global
4 weeks ago

Snowflake shows resilience during AWS outage

Cloud outages expose critical-service vulnerabilities; multi-cloud replication and automated failover maintain business continuity with minimal interruption.
DevOps
fromNew Relic
3 weeks ago

How to survive a horror movie (if that movie is your prod environment)

Missing or orphaned spans fragment distributed traces and obscure root causes, requiring fixes to collection limits, parent IDs, and instrumentation to restore trace continuity.
#linux
fromZDNET
1 month ago
DevOps

Try this new Linux security threat scanner to keep your system safe - you'll thank me

fromZDNET
1 month ago
DevOps

Try this new Linux security threat scanner to keep your system safe - you'll thank me

DevOps
fromTheregister
1 month ago

New boss changed code so it sent two billion unwanted emails

Removal of a rate-limited Log4j error-email plugin caused two billion SQL-error emails, overwhelming the bank's email system and hiding real error information.
fromInfoQ
1 month ago

Airbnb's Mussel V2: Next-Gen Key Value Storage to Unify Streaming and Bulk Ingestion

Airbnb's engineering team has rolled out Mussel v2, a complete rearchitecture of its internal key value engine designed to unify streaming and bulk ingestion while simplifying operations and scaling to larger workloads. The new system reportedly sustains over 100,000 streaming writes per second, supports tables exceeding 100 terabytes with p99 read latencies under 25 milliseconds, and ingests tens of terabytes in bulk workloads, allowing caller teams to focus on product innovation rather than managing data pipelines.
DevOps
DevOps
fromIT Pro
1 month ago

Clunky tech is costing developers 20 working days a year - these are the leading 'productivity drains' impacting teams

US developers lose nearly 20 work days annually to bugs, outages, tool failures, and unofficial IT support, costing about $8,000 in productivity per developer.
fromTheregister
1 month ago

A single DNS race condition brought AWS to its knees

The race condition occurred when one DNS Enactor experienced "unusually high delays" while the DNS Planner continued generating new plans. A second DNS Enactor began applying the newer plans and executed a clean-up process just as the first Enactor completed its delayed run. This clean-up deleted the older plan as stale, immediately removing all IP addresses for the regional endpoint and leaving the system in an inconsistent state that prevented further automated updates applied by any DNS Enactors.
DevOps
DevOps
fromMedium
1 month ago

Context Engineering for AI Code Reviews: Fix Critical Bugs with Outside-Diff Impact Slicing

Outside-Diff Impact Slicing detects bugs by analyzing caller/callee code one hop beyond a patch to reveal contract violations hidden by diffs.
fromTheregister
1 month ago

Microsoft threatens to bring Copilot to on-prem Exchange

"We are exploring the possibility of introducing Copilot for Exchange Server (on-premises)," Microsoft says, linking to a ten-question form that asks: "Would your organization be comfortable enabling Copilot for Exchange Server if it requires sending some Exchange Server data to the cloud?" Er, probably not. After all, many administrators run an on-premises version of Exchange precisely because they don't want any Exchange Server data being sent to Microsoft's cloud.
DevOps
fromTechzine Global
1 month ago

Spotify Portal aims to rock it with developers

The company has now extended its developer tools to be coalesced inside the new Spotify Portal. This is an Internal Developer Platform built by Spotify's Backstage team. In the age of platform engineering, when IDPs hold the promise of self-service computing for developers who want to elevate themselves above the distraction of working with operations to get their infrastructure provisioning handled in an abstracted and automated way, does this technology rock the house?
DevOps
DevOps
fromTechCrunch
1 month ago

Shuttle raises $6 million to fix vibe-coding's deployment problem | TechCrunch

Shuttle automates deployment of vibe-coded applications by packaging infrastructure, handling payment, and deploying to cloud providers to simplify maintenance and scaling.
DevOps
fromInfoWorld
1 month ago

How to improve technical documentation with generative AI

Generative AI enables developers to create and maintain up-to-date devops and ITSM documentation by aligning tools, audience, and purpose with the pace of code changes.
DevOps
fromNew Relic
1 month ago

AWS Outage And Why O11y is Non Negotiable

A DNS failure cascading through DynamoDB dependencies caused multi-service AWS outages, creating backlogs and significant business costs; full-stack observability improves resilience.
DevOps
fromMedium
1 month ago

My Kubestronaut journey

Passed all CNCF Kubernetes certifications (KCNA, CKA, KCSA, CKAD, CKS) with high scores, earning Kubestronaut recognition.
fromCodeuptoday - New Way To Go Ahead
2 months ago

Resolve 404 Error in Netlify - Codeuptoday

Reviewing deployment logs is crucial for identifying site issues. Pay close attention to the Publish Directory setting in Netlify's deploy settings, as this determines where your files are deployed. For sites, ensure it points to the .next directory, while simpler setups might use public. Correct file placement is essential for your website to appear when visitors enter your URL. Remember: Next.js typically uses .next as the publish directory Static sites often use public or dist Check your project's build output to confirm the correct directory
DevOps
DevOps
fromBusiness Matters
1 month ago

The Rise of Automated Certificate Management: Simplifying Security at Scale

Manual certificate management fails at enterprise scale; automated, centralized certificate lifecycle management is required to prevent outages, breaches, and compliance failures.
DevOps
fromInfoQ
1 month ago

Mirantis' Kubernetes Management Platform k0rdent Reaches v1.2.0

k0rdent 1.2.0 provides a centralized, templated control plane managing Kubernetes clusters, services, and observability/FinOps across on-premises, cloud, and hybrid environments.
DevOps
fromClickUp
1 month ago

What Is ITSM Automation? Guide + Use Cases & Tools | ClickUp

Automating ITSM with AI, ML, and workflow tools speeds service delivery, reduces costs and manual errors, and frees IT teams for strategic work.
DevOps
fromInfoQ
1 month ago

Terraform Google Cloud Provider 7.0 Reaches General Availability

Terraform Google Cloud provider 7.0 adds ephemeral credentials, write-only attributes, and stricter schema validation to improve security and catch configuration errors earlier.
fromInfoQ
1 month ago

Talos Linux: Bringing Immutability and Security to Kubernetes Operations

Sidero Labs has been developing Talos Linux, an immutable operating system purpose-built exclusively for running Kubernetes, alongside Omni, a cluster lifecycle management platform. InfoQ met the Sidero team in Amsterdam during the TalosCon 2025 and had conversations about their approach to simplifying Kubernetes operations through minimalism and security-first design. The concept for Talos emerged from practical frustrations with traditional operating systems in enterprise environments.
DevOps
DevOps
fromInfoQ
1 month ago

If Architectures Could Talk, They'd Quote Your Boss

Software architecture mirrors organizational communication and incentives; failures stem from unclear ownership, misaligned incentives, and social friction rather than purely technical issues.
DevOps
fromClickUp
1 month ago

Top Opsgenie Alternatives to Migrate To Before April 2027

Atlassian stopped selling Opsgenie and will end support, prompting teams to evaluate alternatives like ClickUp for incident management, on-call scheduling, and escalations.
DevOps
fromAmazon Web Services
1 month ago

What's New in the AWS Deploy Tool for .NET | Amazon Web Services

AWS Deploy Tool for .NET 2.0 requires .NET 8 and Node.js 18+, adds Podman support alongside Docker, with no other breaking changes to deployment commands.
DevOps
fromfaun.pub
1 month ago

Deploying a Complete RAG Ecosystem with a Single Command: My Ultimate Docker Stack

A single Docker Compose stack provides a ready-to-run local RAG environment combining Ollama, Qdrant, MongoDB, Redis, Neo4j, Keycloak, Mongo Express, and n8n.
DevOps
fromAzure DevOps Blog
1 month ago

Azure DevOps local MCP Server is generally available - Azure DevOps Blog

Local MCP Server for Azure DevOps is now generally available, providing on-prem context injection for LLMs to access Azure DevOps data securely.
DevOps
fromInfoQ
1 month ago

AWS Introduces ECS Managed Instances for Containerized Applications

ECS Managed Instances automates EC2 provisioning, scaling, and maintenance for containerized applications while allowing instance-type control and cost-optimized default selections.
#finops
fromMedium
2 months ago
DevOps

Cloud FinOps Meets DevSecOps: Money-First, Secure Always

Integrate FinOps with DevSecOps to use cloud cost anomalies as security signals, reduce wasted spend, and proactively protect cloud infrastructure.
fromMedium
2 months ago
DevOps

Cloud FinOps Meets DevSecOps: Money-First, Secure Always

Unite FinOps and DevSecOps to detect breaches early, reduce wasted cloud spend, and proactively secure cloud infrastructure.
DevOps
fromInfoWorld
1 month ago

OpenAI Codex adds SDK, admin tools, Slack integration

New admin tools give ChatGPT admins visibility and control over Codex, with monitoring, environment controls, analytics, Slack integration, SDK availability, and usage tracking.
fromIT Pro
1 month ago

The future of networking: programmability and automation

In Part two, we examined secure by design principles, with a approach, secure access service edge (SASE), and quantum-safe planning becoming non-negotiable foundations for the next decade. Automation is another pivotal strand to the change that's taking place. Instead of relying on manual command-line interfaces (CLI), tomorrow's networks will be defined by code, workflows, and application programming interfaces (APIs). From infrastructure as code (IaC) and observability to evolving skillsets, automation is not just about efficiency - it is becoming the DNA of modern networking.
DevOps
DevOps
fromAmazon Web Services
1 month ago

Moeve: Controlling resource deployment at scale with AWS CloudFormation Guard Hooks | Amazon Web Services

Moeve enforces proactive cloud governance using AWS Control Tower, CloudFormation Hooks, SCPs, and mandatory Infrastructure as Code to ensure secure, compliant deployments.
DevOps
fromAmazon Web Services
1 month ago

Beyond Bootstrap: Bootstrapless CDK Deployments at GoDaddy | Amazon Web Services

GoDaddy implemented a bootstrapless AWS CDK deployment flow that enforces governance invisibly and enables single-command developer deployments.
[ Load more ]