#data-engineering

[ follow ]
frommedium.com
1 week ago

Complete Guide to Learn Big Data

Learn big data end-to-end: fundamentals, programming, storage, batch/stream processing, ETL, cloud, ML, governance, and hands-on projects with runnable Airflow and PySpark Docker examples.
Data science
fromMedium
3 weeks ago

Building Resilient Data Systems: Key Lessons from Veronika Durgin

Neglected data engineering tasks are crucial for stable and agile data pipelines.
fromInfoWorld
3 weeks ago

Google updates agents in BigQuery to further automate analytics tasks

Google enhances BigQuery with a new code interpreter and advanced analytics features, improving automation in data engineering and data science tasks.
Artificial intelligence
fromInfoQ
1 month ago

Mandy Gu on Generative AI (GenAI) Implementation, User Profiles and Adoption of LLMs

Generative AI and large language models are being implemented in real-world projects to enhance organizational capabilities.
fromHackernoon
2 years ago

The HackerNoon Newsletter: A Data Engineers Guide to PyIceberg (7/6/2025) | HackerNoon

The arrival of truly intelligent, always-on, AI-native revenue engines is dismantling the way we've structured go-to-market motions for 20 years.
Tech industry
fromInfoWorld
2 months ago

Databricks targets AI bottlenecks with Lakeflow Designer

Lakeflow and OpenFlow reflect two philosophies: Databricks integrates data engineering into a Spark-native, open orchestration fabric, while Snowflake's OpenFlow offers declarative workflow control.
Software development
fromTheregister
2 months ago

Industry reacts to DuckDB's Lakehouse architecture reorg

Databricks' acquisition of Tabular is revitalizing the table formats landscape, especially with DuckDB's innovative offerings.
Data science
fromInfoWorld
2 months ago

Snowflake launches Openflow to tackle AI-era data ingestion challenges

Openflow simplifies data ingestion, transformation, and observability for enterprises engaging with AI use cases.
fromTechzine Global
2 months ago

Fivetran expands Connector SDK for custom data sources

Fivetran's Connector SDK empowers developers to create custom connectors easily, addressing data gaps and enabling centralized data management without the need for extensive DevOps support.
Data science
#e-commerce
fromHackernoon
4 months ago
Data science

Rajesh Sura: Revolutionizing Global Selection Strategy with Data, AI, and Automation | HackerNoon

Product selection is crucial for competitive advantage in e-commerce.
Rajesh Sura's innovations in data engineering have transformed product onboarding and sourcing.
AI-driven models and external data integration enhance product discovery.
frommedium.com
3 months ago
Data science

Day 4Identifying Top 3 Selling Products per Category | Spark Interview Question.

Implement logic to rank products per category based on total units sold, ensuring ties are handled appropriately.
fromHackernoon
4 months ago
Data science

Rajesh Sura: Revolutionizing Global Selection Strategy with Data, AI, and Automation | HackerNoon

#apache-spark
fromMedium
3 months ago
Data science

Day 6-Sessionization of Web Logs using Time Difference | Apache Spark Interview Problem.

fromMedium
3 months ago
Data science

Understanding the load() Function in Apache Spark: Syntax, Examples, and Best Practices

fromMedium
3 months ago
Data science

Day 6-Sessionization of Web Logs using Time Difference | Apache Spark Interview Problem.

fromMedium
3 months ago
Data science

Understanding the load() Function in Apache Spark: Syntax, Examples, and Best Practices

fromHackernoon
4 months ago

How Bharath Rajasekaran Scaled a Global Data Pipeline in 3 Months | HackerNoon

Annalect's data pipeline migration exemplifies outstanding engineering leadership and has transformed the company's approach to cloud data management.
fromMedium
3 months ago

Day 3-Revenue Aggregation per Region and Category | Spark Interview Problem.

To create a daily aggregated revenue dashboard, we need to sum the total revenue for each product category by region, helping managers make informed business decisions.
Data science
fromHackernoon
5 years ago

Traditional Monitoring Is Dead. Long Live Data Observability | HackerNoon

Traditional monitoring fails to meet the needs of complex data organizations; instead, engineers must develop interactive observability frameworks to quickly identify anomalies.
Data science
fromHackernoon
3 years ago

Building a Real-Time Change Data Capture Pipeline with Debezium, Kafka, and PostgreSQL | HackerNoon

The article provides a step-by-step guide to setting up a Change Data Capture (CDC) pipeline using PostgreSQL, Debezium, Apache Kafka, and Python.
Data science
Scala
fromMedium
3 months ago

Data Quality Verification with Deequ: A Practical Approach Using Scala

Utilizing Deequ and Scala for efficient and automated data validation is highly effective for managing large datasets.
Data science
fromHackernoon
7 months ago

LLMs in Data Engineering: Not Just Hype, Here's What's Real | HackerNoon

Large Language Models are transforming data engineering by enhancing performance and operational efficiencies.
DevOps
fromMedium
4 months ago

Evolvability-It's Mostly About Data Contracts

Data Contracts can mitigate complexity in analytic systems by fostering loose coupling and enhancing adaptability.
fromHackernoon
4 months ago

Tired of Copy-Pasting Hive Output? This PySpark Hack Fixes It | HackerNoon

Automating CSV export from Hive or Impala output is essential for efficient data engineering tasks.
Women in technology
fromBusiness Insider
4 months ago

I became a director at Ford after pivoting careers in the last recession. Here are 3 ways to recession-proof your job.

Continuous learning through online courses is key to job security in recessionary times.
fromChannelPro
4 months ago

Datatonic expands global services with Syntio acquisition

We are thrilled to welcome Syntio to the Datatonic family. This acquisition is a key step in our strategy to expand our global reach and enhance our service capabilities.
Data science
Data science
fromTechzine Global
4 months ago

Datatonic acquires Syntio and strengthens expertise in data engineering

Datatonic's acquisition of Syntio enhances its data consultancy with increased capabilities in data engineering and expanded service offerings.
fromawstip.com
4 months ago

Spark Scala Exercise 23: Working with Delta Lake in Spark ScalaACID, Time Travel, and Upserts

Delta Lake enhances data reliability and governance for data lakes by integrating warehouse features.
[ Load more ]