#data-engineering
#data-engineering

1 day ago

Artificial intelligence

Google Cloud introduces AI agents for data processing

fromComputerWeekly.com

5 months ago

Data science

A path to better data engineering | Computer Weekly

1 day ago

Artificial intelligence

Google Cloud introduces AI agents for data processing

fromComputerWeekly.com

5 months ago

Data science

A path to better data engineering | Computer Weekly

more#data-processing

Artificial intelligence

fromInfoQ

1 month ago

Mandy Gu on Generative AI (GenAI) Implementation, User Profiles and Adoption of LLMs

Generative AI and large language models are being implemented in real-world projects to enhance organizational capabilities.

2 years ago

The HackerNoon Newsletter: A Data Engineers Guide to PyIceberg (7/6/2025) | HackerNoon

The arrival of truly intelligent, always-on, AI-native revenue engines is dismantling the way we've structured go-to-market motions for 20 years.

Tech industry

fromInfoWorld

1 month ago

Databricks targets AI bottlenecks with Lakeflow Designer

Lakeflow and OpenFlow reflect two philosophies: Databricks integrates data engineering into a Spark-native, open orchestration fabric, while Snowflake's OpenFlow offers declarative workflow control.

Software development

fromTheregister

Industry reacts to DuckDB's Lakehouse architecture reorg

Databricks' acquisition of Tabular is revitalizing the table formats landscape, especially with DuckDB's innovative offerings.

fromInfoWorld

Snowflake launches Openflow to tackle AI-era data ingestion challenges

Openflow simplifies data ingestion, transformation, and observability for enterprises engaging with AI use cases.

Fivetran expands Connector SDK for custom data sources

Fivetran's Connector SDK empowers developers to create custom connectors easily, addressing data gaps and enabling centralized data management without the need for extensive DevOps support.

Data science

#e-commerce

Data science

Rajesh Sura: Revolutionizing Global Selection Strategy with Data, AI, and Automation | HackerNoon

Data science

Day 4Identifying Top 3 Selling Products per Category | Spark Interview Question.

Data science

Rajesh Sura: Revolutionizing Global Selection Strategy with Data, AI, and Automation | HackerNoon

Data science

Day 4Identifying Top 3 Selling Products per Category | Spark Interview Question.

Data science

Day 6-Sessionization of Web Logs using Time Difference | Apache Spark Interview Problem.

Data science

Understanding the load() Function in Apache Spark: Syntax, Examples, and Best Practices

fromawstip.com

Data science

Spark Scala Exercise 5: Column Operations with DataFramesA Complete Guide for Data Engineers

Scala

Spark Scala Exercise 2: Load a CSV and Count Rows

Data science

Day 6-Sessionization of Web Logs using Time Difference | Apache Spark Interview Problem.

Data science

Understanding the load() Function in Apache Spark: Syntax, Examples, and Best Practices

fromawstip.com

Data science

Spark Scala Exercise 5: Column Operations with DataFramesA Complete Guide for Data Engineers

Scala

Spark Scala Exercise 2: Load a CSV and Count Rows

more#apache-spark

How Bharath Rajasekaran Scaled a Global Data Pipeline in 3 Months | HackerNoon

Annalect's data pipeline migration exemplifies outstanding engineering leadership and has transformed the company's approach to cloud data management.

Day 3-Revenue Aggregation per Region and Category | Spark Interview Problem.

To create a daily aggregated revenue dashboard, we need to sum the total revenue for each product category by region, helping managers make informed business decisions.

Data science

5 years ago

Traditional Monitoring Is Dead. Long Live Data Observability | HackerNoon

Traditional monitoring fails to meet the needs of complex data organizations; instead, engineers must develop interactive observability frameworks to quickly identify anomalies.

Data science

3 years ago

Building a Real-Time Change Data Capture Pipeline with Debezium, Kafka, and PostgreSQL | HackerNoon

The article provides a step-by-step guide to setting up a Change Data Capture (CDC) pipeline using PostgreSQL, Debezium, Apache Kafka, and Python.

Data science

Scala

Data Quality Verification with Deequ: A Practical Approach Using Scala

Utilizing Deequ and Scala for efficient and automated data validation is highly effective for managing large datasets.

7 months ago

LLMs in Data Engineering: Not Just Hype, Here's What's Real | HackerNoon

Large Language Models are transforming data engineering by enhancing performance and operational efficiencies.

DevOps

Evolvability-It's Mostly About Data Contracts

Data Contracts can mitigate complexity in analytic systems by fostering loose coupling and enhancing adaptability.

Tired of Copy-Pasting Hive Output? This PySpark Hack Fixes It | HackerNoon

Automating CSV export from Hive or Impala output is essential for efficient data engineering tasks.

Women in technology

fromBusiness Insider

I became a director at Ford after pivoting careers in the last recession. Here are 3 ways to recession-proof your job.

Continuous learning through online courses is key to job security in recessionary times.

fromChannelPro

Datatonic expands global services with Syntio acquisition

We are thrilled to welcome Syntio to the Datatonic family. This acquisition is a key step in our strategy to expand our global reach and enhance our service capabilities.

Data science

Datatonic acquires Syntio and strengthens expertise in data engineering

Datatonic's acquisition of Syntio enhances its data consultancy with increased capabilities in data engineering and expanded service offerings.

fromawstip.com

Spark Scala Exercise 23: Working with Delta Lake in Spark ScalaACID, Time Travel, and Upserts

Delta Lake enhances data reliability and governance for data lakes by integrating warehouse features.

Spark Scala Exercise 10: Handling Nulls and Data CleaningFrom Raw Data to Analytics-Ready

Effective data cleaning is essential in data engineering to prevent downstream issues caused by nulls.

Spark Scala Exercise 9: Joining Two Datasets in SparkMastering Inner, Left, Right, and Outer

Joining datasets in Spark Scala allows for effective data analysis and relationship understanding.

#spark

Scala

Spark Scala Exercise 1: Hello Spark World with Scala

Data science

100 Days of Data Engineering on Databricks Day 44: PySpark vs. Scala:

Scala

Spark Scala Exercise 1: Hello Spark World with Scala

100 Days of Data Engineering on Databricks Day 44: PySpark vs. Scala:

The choice between PySpark and Scala significantly affects performance and maintainability in Spark development.

more#spark

Artificial intelligence

These AI & Data Engineering Sessions Are a Must-Attend at ODSC East 2025

Organizations are focusing on efficiently and securely integrating advanced AI models at scale.

Practical strategies and real-world insights are essential for navigating AI and data engineering challenges.

Scala

5 months ago

Scala Vs. Python-What Data Engineers Need To Know

Scala improves upon Java while remaining JVM-compatible, making it attractive for organizations.

#data-serving

Business intelligence

Serving Data in the Data Engineering Lifecycle: A Comprehensive Guide

Data engineering culminates in serving data for analytics, ML, and operations.

Data quality and trust are critical in serving data effectively.

fromfaun.pub

Business intelligence

Serving Data in the Data Engineering Lifecycle: A Comprehensive Guide

Data serving is the culmination of data engineering, delivering value to users through analytics and applications.

Business intelligence

Serving Data in the Data Engineering Lifecycle: A Comprehensive Guide

Data engineering culminates in serving data for analytics, ML, and operations.

Data quality and trust are critical in serving data effectively.

fromfaun.pub

Business intelligence

Serving Data in the Data Engineering Lifecycle: A Comprehensive Guide

The Future of Data Engineering: Security, Privacy, and the Path Ahead

Security and privacy are essential to data engineering, integral to ethics and resilience amid evolving challenges.

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

Good data architecture is essential for effective data engineering and organizational competitiveness.

Data science

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

Good data architecture is vital for effective data engineering and organizational competitiveness.

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

Good data architecture is essential for effective data engineering and organizational competitiveness.

Data science

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

more#data-architecture

Understanding Data Generation in Source Systems: How It Works and Real-Time Applications

Data generation is crucial in data engineering lifecycle for reliable processing and transformation.

4 years ago

The Two Types of Data Engineers You Meet at Work | HackerNoon

Data engineers are categorized into two archetypes: business-oriented and tech-oriented, each with distinct roles and responsibilities.

Artificial intelligence

9 months ago

Networking, Hackathons, Meetups, and Other Extra Events Coming to ODSC West 2024

The conference provides hands-on AI learning and immersive networking opportunities.

Participants can engage in various thematic events including hackathons and summits.

ODSC West fosters connections among AI professionals and enthusiasts.

9 months ago

With Databricks Apps, business users get more out of data

Databricks Apps empower business users by simplifying data access, allowing them to create applications without heavy reliance on data engineering, thus facilitating quick and informed decision-making.

Data science