#delta-lake

[ follow ]
Marketing tech
fromAdExchanger
1 day ago

AI Is Nothing Without Data Fidelity. Here's A Four-Step Approach to Protect It | AdExchanger

Data integrity is crucial for effective AI in advertising, as flawed data leads to poor outcomes.
DevOps
fromInfoQ
4 days ago

Etsy Migrates 1000-Shard, 425 TB MySQL Sharding Architecture to Vitess

Etsy migrated its MySQL sharding infrastructure to Vitess, enhancing data management and enabling resharding capabilities.
#ai
Data science
fromInfoWorld
1 day ago

Google Cloud introduces QueryData to help AI agents create reliable database queries

QueryData enhances AI agents' accuracy in querying databases by translating natural language into precise database queries.
#snowflake
Django
fromMedium
1 week ago

Snowflake Supports Directory Imports

Easier package imports into Snowflake functions and procedures from stage directories and SnowGit directories streamline development and deployment.
Artificial intelligence
fromTheregister
3 weeks ago

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake is enhancing its platform for AI integration through strategic partnerships and acquisitions, focusing on customer ROI and data management efficiency.
DevOps
fromTheregister
5 days ago

AWS put a file system on S3; I stress-tested it

AWS S3 Files allows mounting S3 buckets as NFS shares, providing solid conflict resolution and cost-effective storage options.
Tech industry
fromTechzine Global
1 week ago

Oracle close to finalizing financing for Michigan data center

Oracle is finalizing $16 billion financing for a new Michigan data center to support AI applications, amid complex funding challenges.
fromInfoWorld
2 weeks ago

How Apache Kafka flexed to support queues

Apache Kafka has cemented itself as the de facto platform for event streaming, often referred to as the 'universal data substrate' due to its extensive ecosystem that enables connectivity and processing capabilities.
Scala
Business intelligence
fromTheregister
2 weeks ago

Microsoft Fabric Database Hub dubbed 'partial' solution

Microsoft's Fabric Database Hub offers a centralized management solution for its database services but lacks support for non-Microsoft databases.
fromInfoWorld
6 days ago

Bringing databases and Kubernetes together

Automating Kubernetes workloads with Operators can provide the same level of functionality as DBaaS, while still avoiding lock-in to a specific provider.
DevOps
Information security
fromTechzine Global
3 weeks ago

Databricks launches Lakewatch: agentic SIEM on the Lakehouse

Lakewatch is an open SIEM platform that consolidates security, IT, and business data, enabling rapid threat detection and response using AI agents.
Scala
fromMedium
2 weeks ago

Data Extraction and Classification Using Structural Pattern Matching in Scala

Scala pattern matching enhances code readability and extensibility in real-world data engineering use cases.
DevOps
fromInfoQ
6 days ago

Uber's Hive Federation Decentralizes 16K Datasets and 10+ PB for Zero-Downtime Analytics at Scale

Uber redesigned its Hive data warehouse to decentralize datasets, enhancing scalability, security, and operational autonomy for teams.
#databricks
Information security
fromInfoWorld
2 weeks ago

Databricks pitches Lakewatch as a cheaper SIEM - but is it really?

Translating benefits into buy-in from CIOs and CISOs may be challenging for Databricks despite its intent and acquisitions.
Information security
fromTechCrunch
3 weeks ago

Databricks bought two startups to underpin its new AI security product | TechCrunch

Databricks is launching Lakewatch, a new AI-powered security product, following acquisitions of Antimatter and SiftD.ai to enhance its capabilities.
Information security
fromInfoWorld
2 weeks ago

Databricks pitches Lakewatch as a cheaper SIEM - but is it really?

Translating benefits into buy-in from CIOs and CISOs may be challenging for Databricks despite its intent and acquisitions.
Information security
fromTechCrunch
3 weeks ago

Databricks bought two startups to underpin its new AI security product | TechCrunch

Databricks is launching Lakewatch, a new AI-powered security product, following acquisitions of Antimatter and SiftD.ai to enhance its capabilities.
#apache-spark
Java
fromMedium
3 weeks ago

Spark Internals: Understanding Tungsten (Part 1)

Apache Spark revolutionized big data processing but faces challenges due to JVM memory management and garbage collection issues.
Java
fromMedium
3 weeks ago

Spark Internals: Understanding Tungsten (Part 2)

Catalyst Optimizer and Tungsten work together in Apache Spark to optimize data execution and manage raw binary data.
Java
fromMedium
3 weeks ago

Spark Internals: Understanding Tungsten (Part 1)

Apache Spark revolutionized big data processing but faces challenges due to JVM memory management and garbage collection issues.
Java
fromMedium
3 weeks ago

Spark Internals: Understanding Tungsten (Part 2)

Catalyst Optimizer and Tungsten work together in Apache Spark to optimize data execution and manage raw binary data.
DevOps
fromInfoWorld
6 days ago

AWS turns its S3 storage service into a file system for AI agents

S3 Files simplifies access to Amazon S3, enhancing its role as a primary data layer for AI and modern applications.
Vue
fromMedium
4 weeks ago

What is AWS S3 and How I Used It - A Beginner's Guide

AWS S3 is a cloud storage service for developers that stores files (objects) in containers (buckets), offering 99.999999999% durability, infinite scalability, low cost, and global accessibility.
fromTechzine Global
1 week ago

AWS S3 buckets now support file systems

S3 Files is built on Amazon EFS and automatically translates file system operations into S3 requests, allowing applications to work with S3 data without code changes.
DevOps
Business intelligence
fromInfoWorld
3 weeks ago

Snowflake's new 'autonomous' AI layer aims to do the work, not just answer questions

Project SnowWork is Snowflake's autonomous AI layer that automates data analysis tasks like forecasting, churn analysis, and report generation without requiring data team intervention.
Data science
fromMedium
1 month ago

Migrating to the Lakehouse Without the Big Bang: An Incremental Approach

Query federation enables safe, incremental lakehouse migration by allowing simultaneous queries across legacy warehouses and new lakehouse systems without risky big bang cutover approaches.
Software development
fromMedium
1 month ago

Unified Databricks Repository for Scala and Python Data Pipelines

Databricks repositories require structured setup with Gradle for multi-language support, dependency management, and version control to scale beyond manual notebook maintenance.
Data science
fromInfoQ
3 weeks ago

Data Mesh in Action: A Journey From Ideation to Implementation

Data mesh is essential for organizations to develop independent data analytics capabilities after separation from larger parent companies.
DevOps
fromInfoQ
1 week ago

Replacing Database Sequences at Scale Without Breaking 100+ Services

Validating requirements can simplify complex problems, and embedding sequence generation reduces network calls, enhancing performance and reliability.
#ai-automation
Artificial intelligence
fromTechzine Global
3 weeks ago

Snowflake's Project SnowWork targets autonomous enterprise AI

Snowflake launches Project SnowWork, an autonomous AI interface that performs enterprise tasks like forecasts and reports without data team involvement, expanding from backend infrastructure to front-office productivity tool.
fromInfoWorld
1 month ago
Artificial intelligence

Databricks launches Genie Code to automate data science and engineering tasks

Artificial intelligence
fromTechzine Global
3 weeks ago

Snowflake's Project SnowWork targets autonomous enterprise AI

Snowflake launches Project SnowWork, an autonomous AI interface that performs enterprise tasks like forecasts and reports without data team involvement, expanding from backend infrastructure to front-office productivity tool.
fromInfoWorld
1 month ago
Artificial intelligence

Databricks launches Genie Code to automate data science and engineering tasks

Data science
fromMedium
4 weeks ago

Building Consistent Data Foundations at Scale

Building consistent data foundations through intentional architecture, engineering, and governance is essential to prevent fragmentation, support AI adoption, ensure regulatory compliance, and enable reliable organizational decisions at scale.
Business intelligence
fromInfoWorld
1 month ago

Why Postgres has won as the de facto database: Today and for the agentic future

Leading enterprises achieve 5x ROI by adopting open source databases like PostgreSQL to unify structured and unstructured data for agentic AI, with 81% of successful enterprises committed to open source strategies.
DevOps
fromInfoQ
3 weeks ago

Uber Launches IngestionNext: Streaming-First Data Lake Cuts Latency and Compute by 25%

Uber's IngestionNext platform shifts to a streaming-first system, reducing data ingestion latency from hours to minutes for analytics and machine learning.
DevOps
fromTechzine Global
2 weeks ago

DataCore Introduces Swarm Appliance for Edge Data Protection

DataCore's Swarm Appliance offers a comprehensive data protection solution for edge and ROBO environments, combining immutability, encryption, and malware detection.
DevOps
fromInfoQ
3 weeks ago

AWS Expands Aurora DSQL with Playground, New Tool Integrations, and Driver Connectors

Amazon Aurora DSQL introduces usability enhancements, including a browser-based playground and integrations with popular SQL tools for improved developer experience.
fromInfoQ
1 month ago

Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges

Uber's engineering team has transformed its data replication platform to move petabytes of data daily across hybrid cloud and on-premise data lakes, addressing scaling challenges caused by rapidly growing workloads. Built on Hadoop's open-source Distcp framework, the platform now handles over one petabyte of daily replication and hundreds of thousands of jobs with improved speed, reliability, and observability.
Miscellaneous
Data science
fromMedium
1 month ago

100 Scala Interview Questions and Answers for Data Engineers

Structured Scala and Apache Spark interview preparation requires understanding distributed systems, performance trade-offs, and pipeline design beyond theoretical knowledge.
Startup companies
fromInfoQ
2 months ago

Etleap Launches Iceberg Pipeline Platform to Simplify Enterprise Adoption of Apache Iceberg

Managed Iceberg pipeline platform unifies ingestion, transformation, orchestration, and table operations inside customers' VPCs, enabling enterprise Iceberg adoption without building custom stacks.
DevOps
fromInfoWorld
4 weeks ago

Update your databases now to avoid data debt

Multiple major open source databases reach end-of-life in 2026, requiring teams to plan upgrades and migrations to avoid security risks and higher costs.
fromTechzine Global
2 months ago

Databricks makes serverless Postgress service Lakebase available

Databricks today announced the general availability of Lakebase on AWS, a new database architecture that separates compute and storage. The managed serverless Postgres service is designed to help organizations build faster without worrying about infrastructure management. When databases link compute and storage, every query must use the same CPU and memory resources. This can cause a single heavy query to affect all other operations. By separating compute and storage, resources automatically scale with the actual load.
Software development
Artificial intelligence
fromInfoWorld
1 month ago

Why AI requires rethinking the storage-compute divide

AI workloads require continuous processing of unstructured multimodal data, causing redundant data movement and transformation that wastes infrastructure costs and data scientist time.
fromDbmaestro
4 years ago

5 Pillars of Database Compliance Automation |

There is a growing emphasis on database compliance today due to the stricter enforcement of compliance rules and regulations to safeguard user privacy. For example, GDPR fines can reach £17.5 million or 4% of annual global turnover (the higher of the two applies). Besides the direct monetary implications, companies also need to prioritize compliance to protect their brand reputation and achieve growth.
EU data protection
DevOps
fromComputerWeekly.com
4 weeks ago

Everpure's Evergreen One for AI brings Exa flash and GPU-based service-level agreements | Computer Weekly

Everpure launches Evergreen One for AI, a consumption model with GPU-count-based SLAs for FlashBlade//Exa storage to optimize AI workload performance.
fromInfoQ
2 months ago

350PB, Millions of Events, One System: Inside Uber's Cross-Region Data Lake and Disaster Recovery

Uber has built HiveSync, a sharded batch replication system that keeps Hive and HDFS data synchronized across multiple regions, handling millions of Hive events daily. HiveSync ensures cross-region data consistency, enables Uber's disaster recovery strategy, and eliminates inefficiency caused by the secondary region sitting idle, which previously incurred hardware costs equal to the primary, while still maintaining high availability. Built initially on the open-source Airbnb ReAir project, HiveSync has been extended with sharding, DAG-based orchestration, and a separation of control and data planes.
Tech industry
Data science
fromInfoWorld
1 month ago

The revenge of SQL: How a 50-year-old language reinvents itself

SQL has experienced a major comeback driven by SQLite in browsers, improved language tools, and PostgreSQL's jsonb type, making it both traditional and exciting for modern development.
fromInfoWorld
2 months ago

AI is changing the way we think about databases

Developers have spent the past decade trying to forget databases exist. Not literally, of course. We still store petabytes. But for the average developer, the database became an implementation detail; an essential but staid utility layer we worked hard not to think about. We abstracted it behind object-relational mappers (ORM). We wrapped it in APIs. We stuffed semi-structured objects into columns and told ourselves it was flexible.
Software development
#clickhouse
DevOps
fromInfoQ
1 month ago

Running Ray at Scale on AKS

Microsoft and Anyscale provide guidance for running managed Ray service on Azure Kubernetes Service, addressing GPU capacity limits, ML storage challenges, and credential expiry issues through multi-cluster, multi-region deployment strategies.
Tech industry
fromTheregister
2 months ago

Snowflake plugs PostgreSQL into its AI Data Cloud

Snowflake now offers a native PostgreSQL DBaaS in its AI Data Cloud to run transactional workloads alongside analytics and AI under unified governance.
DevOps
fromTechzine Global
1 month ago

Everpure brings ActiveCluster to file environments

Everpure expands its Enterprise Data Cloud platform with ActiveCluster for file environments, enabling seamless data movement between systems while maintaining availability and protecting unstructured data critical for AI applications.
fromTechzine Global
2 months ago

Sumo Logic launches data pipeline apps for Snowflake and Databricks

Snowflake offers a fully managed data platform, but Sumo Logic users often lack insight into performance, login activity, and operational health. The Sumo Logic Snowflake Logs App analyzes login and access activity to identify anomalies or suspicious behavior. It also optimizes data pipelines with insights into long-running or failing queries. Teams can centralize log data to facilitate correlation across applications, cloud services, and data platforms.
Information security
fromMedium
2 months ago

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

"The job didn't fail. It just... never finished." That was the worst part. No errors.No stack traces.Just a Spark job running forever in production - blocking downstream pipelines, delaying reports, and waking up-on-call engineers at 2 AM. This is the story of how I diagnosed a real Spark performance issue in production and fixed it drastically, not by adding more machines - but by understanding Spark properly.
fromTechzine Global
2 months ago

4 steps to create a future-proof data infrastructure

A future-proof IT infrastructure is often positioned as a universal solution that can withstand any change. However, such a solution does not exist. Nevertheless, future-proofing is an important concept for IT leaders navigating continuous technological developments and security risks, all while ensuring that daily business operations continue. The challenge is finding a balance between reactive problem solving and proactive planning, because overlooking a change can cost your organization. So, how do you successfully prepare for the future without that one-size-fits-all solution?
Tech industry
Software development
fromInfoWorld
2 months ago

Why your next microservices should be streaming SQL-driven

Streaming SQL with UDFs, materialized results, and ML/AI integrations enables continuous, stateful processing of event streams for microservices.
DevOps
fromInfoQ
1 month ago

Google BigQuery Previews Cross-Region SQL Queries for Distributed Data

BigQuery's global queries feature enables SQL queries across multiple geographic regions without data movement, eliminating ETL pipelines for distributed analytics.
Data science
fromInfoQ
1 month ago

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Databricks Lakebase is a serverless PostgreSQL OLTP database that separates compute from storage and unifies transactional and analytical capabilities.
Tech industry
fromInfoQ
2 months ago

Google Introduces Managed Connection Pooling for AlloyDB

AlloyDB's managed connection pooling increases client connections and transactional throughput while reducing operational burden and latency for high-concurrency and serverless workloads.
fromInfoWorld
2 months ago

Databricks adds MemAlign to MLflow to cut cost and latency of LLM evaluation

By replacing repeated fine‑tuning with a dual‑memory system, MemAlign reduces the cost and instability of training LLM judges, offering faster adaptation to new domains and changing business policies. Databricks' Mosaic AI Research team has added a new framework, MemAlign, to MLflow, its managed machine learning and generative AI lifecycle development service. MemAlign is designed to help enterprises lower the cost and latency of training LLM-based judges, in turn making AI evaluation scalable and trustworthy enough for production deployments.
Artificial intelligence
Software development
fromInfoQ
2 months ago

Are You Missing a Data Frame? The Power of Data Frames in Java

DataFrames and data-oriented programming promote modeling immutable data separately from behavior, making Java suitable for DataFrame-style data manipulation comparable to Python.
Data science
fromDevOps.com
2 months ago

Why Data Contracts Need Apache Kafka and Apache Flink - DevOps.com

Data contracts formalize schemas, types, and quality constraints through early producer-consumer collaboration to prevent pipeline failures and reduce operational downtime.
fromdeath and gravity
2 months ago

DynamoDB crash course: part 1 - philosophy

A table is a collection of items, and an item is a collection of namedattributes. Items are uniquely identified by apartition key attribute and an optionalsort key attribute. The partition key determines where (i.e. on what computer) an item is stored. The sort key is used to get ordered ranges of items from a specific partition. That's is, that's the whole data model. Sure, there's indexes and transactions and other features, but at its core, this is it. Put another way:
Artificial intelligence
fromTechzine Global
2 months ago

Snowflake launches Cortex Code agent for understanding data context

Cortex Code is an AI agent that converts complex data engineering, ML, and analytics tasks into natural-language workflows integrated into Snowflake and developer tools.
fromDbmaestro
4 years ago

Database DevOps - Where Do I Start? |

Integrating databases into the CI/CD process or the DevOps pipeline is overlooked in the current DevOps landscape. Most organizations have adapted automated DevOps pipelines to handle application code, deployments, testing, and infrastructure configurations. However, database development and administration are left out of the DevOps process and handled separately. This can lead to unforeseen bugs, production issues, and delays in the software development life cycle.
Software development
Data science
fromInfoQ
2 months ago

Beyond the Warehouse: Why BigQuery Alone Won't Solve Your Data Problems

Data warehouses like BigQuery perform well initially but become slow, costly, and disorganized at scale, undermining low-latency operational use and innovation.
fromTechzine Global
2 months ago

Databricks shows how AI strengthens the SaaS model

The rise of generative AI is often seen as an existential threat to the SaaS model. Interfaces would disappear, software would fade away, and existing players would become irrelevant. However, new figures from Databricks paint a different picture. Rather than undermining SaaS, AI appears to be increasing its use. This week, Databricks reported a revenue run rate of $5.4 billion, a 65 percent year-on-year increase. More than a quarter of that now comes from AI-related products.
Artificial intelligence
Data science
fromInfoWorld
2 months ago

Snowflake debuts Cortex Code, an AI agent that understands enterprise data context

Cortex Code enables developers to use natural language to build, optimize, and deploy governed, production-ready data pipelines, analytics, ML workloads, and AI agents.
fromInfoWorld
2 months ago

AI-augmented data quality engineering

SHAP for feature attribution SHAP quantifies each feature's contribution to a model prediction, enabling: LIME for local interpretability LIME builds simple local models around a prediction to show how small changes influence outcomes. It answers questions like: "Would correcting age change the anomaly score?" "Would adjusting the ZIP code affect classification?" Explainability makes AI-based data remediation acceptable in regulated industries.
Artificial intelligence
fromInfoWorld
2 months ago

Snowflake updates developer tools, adds observability features

Snowflake adds observability capabilities via Trail The company also added new observability features in the form of Snowflake Trail, which provides visibility into data quality, pipelines, and applications, enabling developers to monitor, troubleshoot, and optimize their workflows. It is built with OpenTelemetry standards so developers can integrate with popular observability and alert platforms including Datadog, Grafana, Metaplane, PagerDuty, and Slack, among others.
DevOps
Artificial intelligence
fromTechzine Global
1 month ago

Snowflake CEO: Software risks becoming a "dumb data pipe" for AI

Centralized AI access to enterprise data risks reducing business applications to mere data pipes unless applications deliver clear added value in accuracy, security, and usability.
Artificial intelligence
fromInfoWorld
2 months ago

Teradata unveils enterprise AgentStack to push AI agents into production

Teradata positions Enterprise AgentStack as a vendor-agnostic execution layer across hybrid environments, contrasting platform-tied AI approaches from Snowflake and Databricks.
fromDbmaestro
5 years ago

Database Delivery Automation in the Multi-Cloud World

The main advantage of going the Multi-Cloud way is that organizations can "put their eggs in different baskets" and be more versatile in their approach to how they do things. For example, they can mix it up and opt for a cloud-based Platform-as-a-Service (PaaS) solution when it comes to the database, while going the Software-as-a-Service (SaaS) route for their application endeavors.
DevOps
[ Load more ]