#data-engineering

[ follow ]

Deep dive on Spark Aggregation APIs

Complex aggregation problems require advanced solutions beyond straightforward SQL functions.
User Defined Aggregate Functions (UDAFs) are essential for calculating median values in Spark.
Performance and implementation ease are critical factors in selecting aggregation techniques.

Computer Weekly Buyer's Guide features list 2025 | Computer Weekly

Computer Weekly's Buyer's Guides educate and guide readers through the IT buying cycle to ensure informed purchasing decisions.
#data-infrastructure

What's the Deal With Data Engineers Anyway? | HackerNoon

Data engineers build essential data infrastructure, enabling analysts to access structured and timely data for informed decision-making.

10 Important Topics Featured at the 2024 Data Engineering Summit - Summit.ai

Generative AI involves collaboration between data engineers and software engineers.
Data infrastructure challenges include data wrangling, scaling systems, and data security.

What's the Deal With Data Engineers Anyway? | HackerNoon

Data engineers build essential data infrastructure, enabling analysts to access structured and timely data for informed decision-making.

10 Important Topics Featured at the 2024 Data Engineering Summit - Summit.ai

Generative AI involves collaboration between data engineers and software engineers.
Data infrastructure challenges include data wrangling, scaling systems, and data security.
moredata-infrastructure

A guide to transitioning from a data engineer to product manager role - LogRocket Blog

Transitioning from data engineering to product management expands influence, fosters cross-functional leadership, and enhances opportunities to impact business outcomes.
#data-science

Where are AI Investments Going in 2024?

The conference will cover data science and AI trends, tools, and techniques.
Partnerships in organizing events can enhance the quality and participation.

The State of Data Science 2024: 6 Key Data Science Trends | The PyCharm Blog

Python usage in data analysis and machine learning is declining, indicating changing trends in data science.

Why Many Data Science Jobs Are Actually Data Engineering | HackerNoon

Many data scientist roles primarily involve data preparation and cleaning, not advanced data analysis or machine learning as expected.

Unlocking the Power of Gen AI with Data Engineering

Data engineering is crucial for unlocking the potential of Gen AI applications.
Gen AI and data engineering have a symbiotic relationship, enhancing innovation and efficiency.

The Future of the Data Engineer

Maxime Beauchemin paved the way for data engineering with projects like Apache Airflow and Apache Superset, highlighting the importance of specialized engineers in scaling data science.

Where are AI Investments Going in 2024?

The conference will cover data science and AI trends, tools, and techniques.
Partnerships in organizing events can enhance the quality and participation.

The State of Data Science 2024: 6 Key Data Science Trends | The PyCharm Blog

Python usage in data analysis and machine learning is declining, indicating changing trends in data science.

Why Many Data Science Jobs Are Actually Data Engineering | HackerNoon

Many data scientist roles primarily involve data preparation and cleaning, not advanced data analysis or machine learning as expected.

Unlocking the Power of Gen AI with Data Engineering

Data engineering is crucial for unlocking the potential of Gen AI applications.
Gen AI and data engineering have a symbiotic relationship, enhancing innovation and efficiency.

The Future of the Data Engineer

Maxime Beauchemin paved the way for data engineering with projects like Apache Airflow and Apache Superset, highlighting the importance of specialized engineers in scaling data science.
moredata-science
#big-data

Scala Applications in Data Engineering: A Comprehensive Overview

Scala is an ideal choice for data engineering, particularly with big data frameworks like Apache Spark.

Choosing Your First Language in Data Engineering: A Beginner's Guide

Choosing the right programming language is crucial for your data engineering career.
Python is favored for its simplicity, rich libraries, and big data integration.

Scala Applications in Data Engineering: A Comprehensive Overview

Scala is an ideal choice for data engineering, particularly with big data frameworks like Apache Spark.

Choosing Your First Language in Data Engineering: A Beginner's Guide

Choosing the right programming language is crucial for your data engineering career.
Python is favored for its simplicity, rich libraries, and big data integration.
morebig-data

Job Vacancy: Data Engineer - Climate Tech - Python // Climatiq | IT / Software Development Jobs | Berlin Startup Jobs

Climatiq is a climate tech startup focused on driving action through a carbon calculation engine used by organizations globally.
#startup

Job Vacancy: Lead Frontend Engineer // GlassFlow | IT / Software Development Jobs | Berlin Startup Jobs

GlassFlow is developing a hands-free data streaming platform to simplify real-time data management for engineers.

Job Vacancy: Lead Frontend Engineer // GlassFlow | IT / Software Development Jobs | Berlin Startup Jobs

GlassFlow is developing a user-friendly data streaming platform that simplifies real-time data access for engineers.
The role offers a unique chance to shape the future of an innovative data infrastructure startup.
The company focuses on creating an inclusive workplace with competitive benefits for its employees.

Job Vacancy: Lead Frontend Engineer // GlassFlow | IT / Software Development Jobs | Berlin Startup Jobs

GlassFlow is developing a hands-free data streaming platform to simplify real-time data management for engineers.

Job Vacancy: Lead Frontend Engineer // GlassFlow | IT / Software Development Jobs | Berlin Startup Jobs

GlassFlow is developing a user-friendly data streaming platform that simplifies real-time data access for engineers.
The role offers a unique chance to shape the future of an innovative data infrastructure startup.
The company focuses on creating an inclusive workplace with competitive benefits for its employees.
morestartup

Optimizing Uber's Search Infrastructure: Upgrading to Apache Lucene 9.5

Uber upgraded its search infrastructure from Apache Lucene 8.0 to 9.5, improving search capabilities and overall performance.

Why I Chose Google Cloud Platform (GCP) for Data Engineering: Real-World Benefits

GCP is preferred for data engineering due to its scalability, integrated analytics, and cost-effectiveness.
#ai

Announcing the First Speakers for the 2024 Data Engineering Summit

Data-centric AI enables improving models without code changes
Apache Arrow and Parquet allow fast analytic operations and interoperability.

Networking, Hackathons, Meetups, and Other Extra Events Coming to ODSC West 2024

The conference provides hands-on AI learning and immersive networking opportunities.
Participants can engage in various thematic events including hackathons and summits.
ODSC West fosters connections among AI professionals and enthusiasts.

Announcing the First Speakers for the 2024 Data Engineering Summit

Data-centric AI enables improving models without code changes
Apache Arrow and Parquet allow fast analytic operations and interoperability.

Networking, Hackathons, Meetups, and Other Extra Events Coming to ODSC West 2024

The conference provides hands-on AI learning and immersive networking opportunities.
Participants can engage in various thematic events including hackathons and summits.
ODSC West fosters connections among AI professionals and enthusiasts.
moreai
#generative-ai

Data Observability: Multicloud, GenAI Make Challenges Harder

Acceldata's focus on data observability capitalizes on the exponential growth of data and the increasing complexity of managing it across multicloud systems.

Edo Liberty on Vector Databases for Successful Adoption of Generative AI and LLM based Applications

Vector databases play a critical role in the generative AI or GenAI space.

Data Observability: Multicloud, GenAI Make Challenges Harder

Acceldata's focus on data observability capitalizes on the exponential growth of data and the increasing complexity of managing it across multicloud systems.

Edo Liberty on Vector Databases for Successful Adoption of Generative AI and LLM based Applications

Vector databases play a critical role in the generative AI or GenAI space.
moregenerative-ai
#performance-optimization

Why to avoid multiple chaining of withColumn() function in Spark job.

Chaining multiple withColumn() calls in Spark may lead to performance issues and inefficient resource usage.

Understanding Spark Re-Partition

Spark's repartition() function is crucial for managing data skewness, optimizing performance, memory utilization, and downstream query efficiency.

Why to avoid multiple chaining of withColumn() function in Spark job.

Chaining multiple withColumn() in Spark can slow down execution and increase memory usage.

Why to avoid multiple chaining of withColumn() function in Spark job.

Chaining multiple withColumn() calls in Spark may lead to performance issues and inefficient resource usage.

Understanding Spark Re-Partition

Spark's repartition() function is crucial for managing data skewness, optimizing performance, memory utilization, and downstream query efficiency.

Why to avoid multiple chaining of withColumn() function in Spark job.

Chaining multiple withColumn() in Spark can slow down execution and increase memory usage.
moreperformance-optimization
#career-development

I failed Meta's technical interview. Here's what it was like and what I wish I'd done differently.

Preparation for technical interviews is crucial, but adapting one's approach may be equally important for success.

Mastering Data in the Modern Age with Vishwanadham Mandala | HackerNoon

Vishwanadham Mandala's career in data engineering exemplifies dedication, leadership, and a commitment to mentoring future technology professionals.

I failed Meta's technical interview. Here's what it was like and what I wish I'd done differently.

Preparation for technical interviews is crucial, but adapting one's approach may be equally important for success.

Mastering Data in the Modern Age with Vishwanadham Mandala | HackerNoon

Vishwanadham Mandala's career in data engineering exemplifies dedication, leadership, and a commitment to mentoring future technology professionals.
morecareer-development

With Databricks Apps, business users get more out of data

Databricks Apps enhance data accessibility for business users, enabling quicker insights without extensive engineering work.
#databricks

Databricks launches LakeFlow to help its customers build their data pipelines | TechCrunch

Databricks introduced LakeFlow as its internal data engineering solution to handle data ingestion, transformation, and orchestration, reducing the reliance on third-party tools.

Snowflake & Databricks need Data FinOps - Something Chaos Genius excels at - Amazic

Chaos Genius offers a solution to optimize data engineering for cost and performance.

Are the table format wars entering the final chapter?

Databricks' acquisition of Tabular for $1 billion underscores the rising importance of the Apache Iceberg table format in data engineering.

Databricks launches LakeFlow to help its customers build their data pipelines | TechCrunch

Databricks introduced LakeFlow as its internal data engineering solution to handle data ingestion, transformation, and orchestration, reducing the reliance on third-party tools.

Snowflake & Databricks need Data FinOps - Something Chaos Genius excels at - Amazic

Chaos Genius offers a solution to optimize data engineering for cost and performance.

Are the table format wars entering the final chapter?

Databricks' acquisition of Tabular for $1 billion underscores the rising importance of the Apache Iceberg table format in data engineering.
moredatabricks

The Importance of Data Structures and Algorithms in the Life of a Data Engineer

Mastering Data Structures and Algorithms is crucial for optimizing data engineering tasks.

Web3 Data Engineering Crash Course | HackerNoon

Web3 data architecture is transforming how enterprise and scientific data are approached, emphasizing cross-organizational data exchange over internal data.
#open-source

InfoQ AI, ML, and Data Engineering Trends in 2024

The podcast discusses current AI and ML trends with expert insights, showcasing innovations and the impact of community contributions in these technologies.

Breaking Down the Worker Task Execution in Apache DolphinScheduler | HackerNoon

Apache DolphinScheduler is an enterprise-level visual workflow scheduling system that offers flexibility, scalability, and robust fault tolerance.

InfoQ AI, ML, and Data Engineering Trends in 2024

The podcast discusses current AI and ML trends with expert insights, showcasing innovations and the impact of community contributions in these technologies.

Breaking Down the Worker Task Execution in Apache DolphinScheduler | HackerNoon

Apache DolphinScheduler is an enterprise-level visual workflow scheduling system that offers flexibility, scalability, and robust fault tolerance.
moreopen-source

LLMs: An Assessment From a Data Engineer | HackerNoon

AI like GenAI and ChatGPT can enhance data engineering productivity with precise requirements.
AI is not likely to fully replace human expertise in data engineering; areas like basic data querying, troubleshooting pipeline failures, and anomaly detection still require human intervention.

Job Vacancy: Senior Data Engineer // Latana | IT / Software Development Jobs | Berlin Startup Jobs

Latana provides brand insights for better marketing decisions and works with top B2C brands like Headspace and Unilever to optimize brand performance.

Are All Monoliths Bad?

Monolith vs. Microservices: Complexity and team size influence architecture choice.

The Future of Data Engineering Goes Through Data Contracts

Data engineering grows exponentially with company expansion and mergers.
Data Mesh is a disruptive solution with significant organizational level changes.
#open-source-tools

More Speakers and Sessions Announced for the 2024 Data Engineering Summit

The importance of leveraging big data tools in making business decisions
Strategies and technologies for avoiding monolithic data infrastructure

11 Open-Source Data Engineering Tools Every Pro Should Use

Apache Spark is a leading framework for large-scale data processing, offering versatile functionalities like batch processing and stream processing.
Apache Kafka is an open-source streaming platform that is ideal for handling real-time data and high-throughput data feeds.
Snowflake, Amazon Redshift, and Google BigQuery are popular cloud data warehouses, each with unique features that data engineers should understand in order to choose the best fit for their projects.

More Speakers and Sessions Announced for the 2024 Data Engineering Summit

The importance of leveraging big data tools in making business decisions
Strategies and technologies for avoiding monolithic data infrastructure

11 Open-Source Data Engineering Tools Every Pro Should Use

Apache Spark is a leading framework for large-scale data processing, offering versatile functionalities like batch processing and stream processing.
Apache Kafka is an open-source streaming platform that is ideal for handling real-time data and high-throughput data feeds.
Snowflake, Amazon Redshift, and Google BigQuery are popular cloud data warehouses, each with unique features that data engineers should understand in order to choose the best fit for their projects.
moreopen-source-tools

Data Pipelines with Dagster

Dagster is a powerful tool for creating data pipelines using Python.
Pedram Navid, Head of Data Engineering at Dagster Labs, discusses data pipelines on Talk Python.

Spark Starter Guide 4.13: Importing Data from a Relational Database (MySQL)

Relational databases are vital for operational data but can also hold valuable analytics data.
Spark simplifies accessing databases to populate Spark DataFrames for analysis.

What Does a Data Engineering Job Involve in 2024?

Data engineers play a crucial role in collecting, storing, and processing data for analysis and decision-making.
Data integration is a key responsibility of data engineers, involving combining data from multiple sources into a single, usable format.

How to Shift from Data Science to Data Engineering

There is a high demand for skilled data engineers.
Data scientists can transition into data engineering due to transferable skills.

Must-Have Prompt Engineering Skills, Preventing Data Poisoning, and How AI Will Impact Various...

Baidu's chatbot Ernie Bot has gained over 100 million users
ODSC has various events and resources related to AI, including a podcast and conference

Accenture looks to bolster AI capabilities with Redkite acquisition

Accenture acquires data consultancy Redkite to enhance data and AI capabilities

Job Vacancy: (Senior) Big Data Engineer * // GameDuell | IT / Software Development Jobs | Berlin Startup Jobs

GameDuell is a leading casual gaming company in Germany with over 130 million registered players worldwide.
They are looking for a (Senior) Big Data Engineer to build and maintain their Big Data infrastructure.

Microsoft and Databricks try to sew up the data platform

Microsoft Fabric is being hailed as one of Microsoft's biggest data product launches since SQL Server.
Fabric offers data engineering, data lakes, data warehousing, machine learning, and AI on a single platform.
Competitors like Snowflake and Google are catching up quickly in the race to offer similar features and capabilities.

Using OpenTelemetry to monitor Apache Airflow

Monitoring Airflow is vital for optimizing performance and reliability of data pipelines.

Podcast: HPCC-Open-Source Platform High-Performance Computing on Large-Scale Data

Discover HPCC, a high-performance computer cluster for data analytics, through the insights of Bob Foreman, in an episode of Ai X Podcast.
[ Load more ]