#dfs

[ follow ]
DevOps
fromInfoQ
17 hours ago

Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes

Yelp upgraded its Apache Cassandra infrastructure across 1,000 nodes without downtime, showcasing effective management of stateful systems at scale.
Data science
fromTechzine Global
12 hours ago

Pinecone On-Demand is thirsty for bursty workloads

Pinecone offers solutions for variable and sustained query workloads in AI, focusing on cost-effective and predictable performance.
#aws
fromTechCrunch
5 hours ago
Tech industry

In another wild turn for AI chips, Meta signs deal for millions of Amazon AI CPUs | TechCrunch

Tech industry
fromTechCrunch
5 hours ago

In another wild turn for AI chips, Meta signs deal for millions of Amazon AI CPUs | TechCrunch

Meta has signed a deal to use millions of AWS Graviton chips for its AI needs, shifting from competitors like Google Cloud.
DevOps
fromInfoQ
1 week ago

AWS Introduces S3 Files, Bringing File System Access to S3 Buckets

AWS S3 Files allows users to mount S3 buckets for standard file system access, optimizing data access for various applications.
DevOps
fromTheregister
2 weeks ago

AWS put a file system on S3; I stress-tested it

AWS S3 Files allows mounting S3 buckets as NFS shares, providing solid conflict resolution and cost-effective storage options.
Software development
fromInfoQ
2 days ago

Dropbox Collaborates with GitHub to Reduce Monorepo Size from 87GB to 20GB

Dropbox engineers reduced their backend monorepo size from 87GB to 20GB by optimizing Git's storage and delta compression model.
#snowflake
Django
fromMedium
3 weeks ago

Snowflake Supports Directory Imports

Easier package imports into Snowflake functions and procedures from stage directories and SnowGit directories streamline development and deployment.
Artificial intelligence
fromTheregister
1 month ago

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake is enhancing its platform for AI integration through strategic partnerships and acquisitions, focusing on customer ROI and data management efficiency.
Django
fromMedium
3 weeks ago

Snowflake Supports Directory Imports

Easier package imports into Snowflake functions and procedures from stage directories and SnowGit directories streamline development and deployment.
Artificial intelligence
fromTheregister
1 month ago

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake is enhancing its platform for AI integration through strategic partnerships and acquisitions, focusing on customer ROI and data management efficiency.
#ai-adoption
fromTechzine Global
2 months ago
Artificial intelligence

Starburst: Chewing through data access is key to AI adoption

AI adoption is bottlenecked by lack of access to contextual, current, and governed data; without that, AI cannot reliably increase productivity.
Scala
fromInfoQ
1 week ago

Lakehouse Tower of Babel: Handling Identifier Resolution Rules Across Database Engines

Open table formats standardize data semantics but lack SQL dialect interoperability, complicating identifier resolution across different engines.
DevOps
fromInfoQ
1 day ago

When a Cloud Region Fails: Rethinking High Availability in a Geopolitically Unstable World

Cloud regions are influenced by geopolitical events, necessitating multi-region strategies for resilience against disruptions.
fromComputerWeekly.com
2 days ago

Blackbox replaces two racks of HPE storage with 8U of Everpure | Computer Weekly

Blackbox Hosting has consolidated storage from two full racks down to just 8U of rack space following migration to Everpure FlashArray hardware, achieving a 10:1 data reduction ratio and an 85% reduction in power utilization.
DevOps
DevOps
fromComputerWeekly.com
4 days ago

Storage implications of a modern IT architecture | Computer Weekly

Organizations are increasingly using containers to modernize applications and manage both cloud-native and traditional workloads with Kubernetes.
Software development
fromInfoQ
2 weeks ago

TigerFS Mounts PostgreSQL Databases as a Filesystem for Developers and AI Agents

TigerFS is an experimental filesystem that integrates PostgreSQL, allowing file operations through a standard filesystem interface.
fromInfoWorld
3 weeks ago

How Apache Kafka flexed to support queues

Apache Kafka has cemented itself as the de facto platform for event streaming, often referred to as the 'universal data substrate' due to its extensive ecosystem that enables connectivity and processing capabilities.
Scala
Business intelligence
fromTheregister
3 weeks ago

Microsoft Fabric Database Hub dubbed 'partial' solution

Microsoft's Fabric Database Hub offers a centralized management solution for its database services but lacks support for non-Microsoft databases.
#agentic-ai
Information security
fromTechzine Global
1 month ago

Databricks launches Lakewatch: agentic SIEM on the Lakehouse

Lakewatch is an open SIEM platform that consolidates security, IT, and business data, enabling rapid threat detection and response using AI agents.
Information security
fromTechzine Global
1 month ago

Databricks launches Lakewatch: agentic SIEM on the Lakehouse

Lakewatch is an open SIEM platform that consolidates security, IT, and business data, enabling rapid threat detection and response using AI agents.
#apache-spark
Java
fromMedium
1 month ago

Spark Internals: Understanding Tungsten (Part 1)

Apache Spark revolutionized big data processing but faces challenges due to JVM memory management and garbage collection issues.
Java
fromMedium
1 month ago

Spark Internals: Understanding Tungsten (Part 2)

Catalyst Optimizer and Tungsten work together in Apache Spark to optimize data execution and manage raw binary data.
Java
fromMedium
1 month ago

Spark Internals: Understanding Tungsten (Part 1)

Apache Spark revolutionized big data processing but faces challenges due to JVM memory management and garbage collection issues.
Java
fromMedium
1 month ago

Spark Internals: Understanding Tungsten (Part 2)

Catalyst Optimizer and Tungsten work together in Apache Spark to optimize data execution and manage raw binary data.
#ai-infrastructure
fromInfoWorld
1 month ago
Business intelligence

Why Postgres has won as the de facto database: Today and for the agentic future

Business intelligence
fromInfoWorld
1 month ago

Why Postgres has won as the de facto database: Today and for the agentic future

Leading enterprises achieve 5x ROI by adopting open source databases like PostgreSQL to unify structured and unstructured data for agentic AI, with 81% of successful enterprises committed to open source strategies.
Data science
fromInfoQ
1 month ago

Data Mesh in Action: A Journey From Ideation to Implementation

Data mesh is essential for organizations to develop independent data analytics capabilities after separation from larger parent companies.
DevOps
fromInfoQ
2 weeks ago

Uber's Hive Federation Decentralizes 16K Datasets and 10+ PB for Zero-Downtime Analytics at Scale

Uber redesigned its Hive data warehouse to decentralize datasets, enhancing scalability, security, and operational autonomy for teams.
fromTechzine Global
2 weeks ago

AWS S3 buckets now support file systems

S3 Files is built on Amazon EFS and automatically translates file system operations into S3 requests, allowing applications to work with S3 data without code changes.
DevOps
DevOps
fromInfoWorld
2 weeks ago

AWS turns its S3 storage service into a file system for AI agents

S3 Files simplifies access to Amazon S3, enhancing its role as a primary data layer for AI and modern applications.
Data science
fromMedium
1 month ago

Migrating to the Lakehouse Without the Big Bang: An Incremental Approach

Query federation enables safe, incremental lakehouse migration by allowing simultaneous queries across legacy warehouses and new lakehouse systems without risky big bang cutover approaches.
Business intelligence
fromInfoWorld
1 month ago

Snowflake's new 'autonomous' AI layer aims to do the work, not just answer questions

Project SnowWork is Snowflake's autonomous AI layer that automates data analysis tasks like forecasting, churn analysis, and report generation without requiring data team intervention.
Data science
fromMedium
1 month ago

Building Consistent Data Foundations at Scale

Building consistent data foundations through intentional architecture, engineering, and governance is essential to prevent fragmentation, support AI adoption, ensure regulatory compliance, and enable reliable organizational decisions at scale.
DevOps
fromInfoQ
2 weeks ago

Replacing Database Sequences at Scale Without Breaking 100+ Services

Validating requirements can simplify complex problems, and embedding sequence generation reduces network calls, enhancing performance and reliability.
Information security
fromComputerworld
1 month ago

Storage vendor offers a real guarantee - but check out those fine-print exceptions

Tech vendors frequently offer performance guarantees with substantial financial penalties, but hidden exceptions in EULAs often make claims difficult or impossible to collect.
fromInfoQ
1 month ago

Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges

Uber's engineering team has transformed its data replication platform to move petabytes of data daily across hybrid cloud and on-premise data lakes, addressing scaling challenges caused by rapidly growing workloads. Built on Hadoop's open-source Distcp framework, the platform now handles over one petabyte of daily replication and hundreds of thousands of jobs with improved speed, reliability, and observability.
Miscellaneous
DevOps
fromTechzine Global
4 weeks ago

DataCore Introduces Swarm Appliance for Edge Data Protection

DataCore's Swarm Appliance offers a comprehensive data protection solution for edge and ROBO environments, combining immutability, encryption, and malware detection.
Software development
fromMedium
1 month ago

Unified Databricks Repository for Scala and Python Data Pipelines

Databricks repositories require structured setup with Gradle for multi-language support, dependency management, and version control to scale beyond manual notebook maintenance.
US politics
fromFortune
2 months ago

Inside the race to build data centers | Fortune

Mega-scale AI data centers are driving AI growth, transforming landscapes, straining energy and water resources, and creating major political and economic conflicts.
DevOps
fromInfoWorld
4 weeks ago

Rethinking VM data protection in cloud-native environments

KubeVirt enables Kubernetes to manage both VMs and containers, requiring new strategies for VM lifecycle management and data protection.
fromDbmaestro
5 years ago

5 Pillars of Database Compliance Automation |

There is a growing emphasis on database compliance today due to the stricter enforcement of compliance rules and regulations to safeguard user privacy. For example, GDPR fines can reach £17.5 million or 4% of annual global turnover (the higher of the two applies). Besides the direct monetary implications, companies also need to prioritize compliance to protect their brand reputation and achieve growth.
EU data protection
#databricks
Tech industry
fromInfoQ
2 months ago

Uber Moves from Static Limits to Priority-Aware Load Control for Distributed Storage

Priority-aware, colocated load management with CoDel and per-tenant Scorecard protects stateful multi-tenant databases by prioritizing critical traffic and adapting dynamically to prevent overloads.
DevOps
fromInfoQ
1 month ago

AWS Expands Aurora DSQL with Playground, New Tool Integrations, and Driver Connectors

Amazon Aurora DSQL introduces usability enhancements, including a browser-based playground and integrations with popular SQL tools for improved developer experience.
Software development
fromMedium
2 months ago

The Complete Database Scaling Playbook: From 1 to 10,000 Queries Per Second

Database scaling to 10,000 QPS requires staged architectural strategies timed to traffic thresholds to avoid outages or unnecessary cost.
Tech industry
fromTheregister
2 months ago

Oracle promises new approach to MySQL

Oracle commits to new engineering leadership, developer-focused features, greater transparency, and expanded community engagement to guide MySQL through 2026 and beyond.
Artificial intelligence
fromInfoWorld
1 month ago

Why AI requires rethinking the storage-compute divide

AI workloads require continuous processing of unstructured multimodal data, causing redundant data movement and transformation that wastes infrastructure costs and data scientist time.
fromMedium
3 months ago

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

"The job didn't fail. It just... never finished." That was the worst part. No errors.No stack traces.Just a Spark job running forever in production - blocking downstream pipelines, delaying reports, and waking up-on-call engineers at 2 AM. This is the story of how I diagnosed a real Spark performance issue in production and fixed it drastically, not by adding more machines - but by understanding Spark properly.
fromTechzine Global
2 months ago

4 steps to create a future-proof data infrastructure

A future-proof IT infrastructure is often positioned as a universal solution that can withstand any change. However, such a solution does not exist. Nevertheless, future-proofing is an important concept for IT leaders navigating continuous technological developments and security risks, all while ensuring that daily business operations continue. The challenge is finding a balance between reactive problem solving and proactive planning, because overlooking a change can cost your organization. So, how do you successfully prepare for the future without that one-size-fits-all solution?
Tech industry
#ai
DevOps
fromInfoWorld
1 month ago

Update your databases now to avoid data debt

Multiple major open source databases reach end-of-life in 2026, requiring teams to plan upgrades and migrations to avoid security risks and higher costs.
fromInfoWorld
2 months ago

AI is changing the way we think about databases

Developers have spent the past decade trying to forget databases exist. Not literally, of course. We still store petabytes. But for the average developer, the database became an implementation detail; an essential but staid utility layer we worked hard not to think about. We abstracted it behind object-relational mappers (ORM). We wrapped it in APIs. We stuffed semi-structured objects into columns and told ourselves it was flexible.
Software development
Data science
fromInfoQ
2 months ago

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Databricks Lakebase is a serverless PostgreSQL OLTP database that separates compute from storage and unifies transactional and analytical capabilities.
fromTechzine Global
1 month ago

NetApp launches EF50 and EF80 for AI and HPC workloads

As businesses contend with ever-increasing data volumes and performance-intensive applications such as AI model training, AI inferencing and high-performance computing, they need infrastructure that delivers speed, scalability and efficiency without added complexity.
DevOps
Data science
fromDevOps.com
2 months ago

Why Data Contracts Need Apache Kafka and Apache Flink - DevOps.com

Data contracts formalize schemas, types, and quality constraints through early producer-consumer collaboration to prevent pipeline failures and reduce operational downtime.
Artificial intelligence
fromInfoQ
2 months ago

Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark

A Q-learning agent autonomously learns and generalizes optimal Spark configurations by discretizing dataset features and combining with Adaptive Query Execution for superior performance.
Software development
fromInfoWorld
2 months ago

Why your next microservices should be streaming SQL-driven

Streaming SQL with UDFs, materialized results, and ML/AI integrations enables continuous, stateful processing of event streams for microservices.
Data science
fromInfoWorld
1 month ago

Buyer's guide: Comparing the leading cloud data platforms

Five leading cloud data platforms—Databricks, Snowflake, Amazon RedShift, Google BigQuery, and Microsoft Fabric—offer distinct architectural approaches for enterprise data storage, analytics, and AI workloads.
fromTechzine Global
2 months ago

Databricks makes serverless Postgress service Lakebase available

Databricks today announced the general availability of Lakebase on AWS, a new database architecture that separates compute and storage. The managed serverless Postgres service is designed to help organizations build faster without worrying about infrastructure management. When databases link compute and storage, every query must use the same CPU and memory resources. This can cause a single heavy query to affect all other operations. By separating compute and storage, resources automatically scale with the actual load.
Software development
fromdeath and gravity
2 months ago

DynamoDB crash course: part 2 - data model

Core components According to the documentation, the core components of DynamoDB are tables, items, and attributes. This is accurate in the sense of what you can act on through the API, but can be deceptively simple, and leaves out two other equally important aspects: what you can do with it (the logical model) and how it scales (the physical model).
DevOps
fromInfoQ
1 month ago

Running Ray at Scale on AKS

Microsoft and Anyscale provide guidance for running managed Ray service on Azure Kubernetes Service, addressing GPU capacity limits, ML storage challenges, and credential expiry issues through multi-cluster, multi-region deployment strategies.
DevOps
fromInfoQ
1 month ago

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture

Uber redesigned MySQL infrastructure using Group Replication to reduce failover time from minutes to seconds while maintaining strong consistency across thousands of clusters.
Software development
fromMedium
2 months ago

When Kafka Lag Lies: A Production Debugging Story

Uncommitted Kafka offsets can cause persistent consumer-group lag even when ingestion is low, databases are idle, and no errors are observed.
Software development
fromInfoWorld
2 months ago

4 self-contained databases for your apps

XAMPP provides a complete local web stack (MariaDB, Apache, PHP, Mercury SMTP, OpenSSL) while PostgreSQL can be run standalone or embedded via pgserver in Python.
fromInfoWorld
2 months ago

Databricks adds MemAlign to MLflow to cut cost and latency of LLM evaluation

By replacing repeated fine‑tuning with a dual‑memory system, MemAlign reduces the cost and instability of training LLM judges, offering faster adaptation to new domains and changing business policies. Databricks' Mosaic AI Research team has added a new framework, MemAlign, to MLflow, its managed machine learning and generative AI lifecycle development service. MemAlign is designed to help enterprises lower the cost and latency of training LLM-based judges, in turn making AI evaluation scalable and trustworthy enough for production deployments.
Artificial intelligence
Software development
fromInfoQ
2 months ago

Are You Missing a Data Frame? The Power of Data Frames in Java

DataFrames and data-oriented programming promote modeling immutable data separately from behavior, making Java suitable for DataFrame-style data manipulation comparable to Python.
fromInfoWorld
2 months ago

The 'Super Bowl' standard: Architecting distributed systems for massive concurrency

When I manage infrastructure for major events (whether it is the Olympics, a Premier League match or a season finale) I am dealing with a "thundering herd" problem that few systems ever face. Millions of users log in, browse and hit "play" within the same three-minute window. But this challenge isn't unique to media. It is the same nightmare that keeps e-commerce CTOs awake before Black Friday or financial systems architects up during a market crash. The fundamental problem is always the same: How do you survive when demand exceeds capacity by an order of magnitude?
DevOps
DevOps
fromInfoQ
1 month ago

Google BigQuery Previews Cross-Region SQL Queries for Distributed Data

BigQuery's global queries feature enables SQL queries across multiple geographic regions without data movement, eliminating ETL pipelines for distributed analytics.
fromDbmaestro
5 years ago

Database Delivery Automation in the Multi-Cloud World

The main advantage of going the Multi-Cloud way is that organizations can "put their eggs in different baskets" and be more versatile in their approach to how they do things. For example, they can mix it up and opt for a cloud-based Platform-as-a-Service (PaaS) solution when it comes to the database, while going the Software-as-a-Service (SaaS) route for their application endeavors.
DevOps
[ Load more ]