Data Engineering: Getting Started with Delta Lake
Briefly

This post is mainly for getting started with Delta Lake using Apache Spark + Scala programming language on Spark Shell, so let's start.
I choose Apache Spark: 3.5.0, Scala 2.12, and Delta Lake: 3.1.0, so if you don't have these versions on your local machine, then let's get installed first and jump to the below spark-shell commands.
Let's create a small dataset to explore how we can ingest that into a data lake table with ACID properties. This approach is applicable across various cloud providers such as AWS, GCP, or Azure. Have you ever thought about how open-source storage frameworks like Apache Hudi, Delta Lake, and Apache Iceberg efficiently handle petabyte-scale data stored in AWS S3 or GCP Cloud Storage buckets? How should we efficiently utilize the concept of prefixes in the AWS S3 or cloud storage of CGP?
Read at Medium
[
]
[
|
]