Data Engineering: Getting Started with Delta Lake

from Medium 11 months ago

This post is mainly for getting started with Delta Lake using Apache Spark + Scala programming language on Spark Shell, so let's start.
Mediumhttps://medium.com/@krishnaiitd/data-engineering-getting-started-with-delta-lake-a142f8025687

I choose Apache Spark: 3.5.0, Scala 2.12, and Delta Lake: 3.1.0, so if you don't have these versions on your local machine, then let's get installed first and jump to the below spark-shell commands.
Mediumhttps://medium.com/@krishnaiitd/data-engineering-getting-started-with-delta-lake-a142f8025687

Let's create a small dataset to explore how we can ingest that into a data lake table with ACID properties. This approach is applicable across various cloud providers such as AWS, GCP, or Azure. Have you ever thought about how open-source storage frameworks like Apache Hudi, Delta Lake, and Apache Iceberg efficiently handle petabyte-scale data stored in AWS S3 or GCP Cloud Storage buckets? How should we efficiently utilize the concept of prefixes in the AWS S3 or cloud storage of CGP?
Mediumhttps://medium.com/@krishnaiitd/data-engineering-getting-started-with-delta-lake-a142f8025687

Read at Medium

#delta-lake #data-lakes #apache-spark #etl-pipelines

Collection

[

...

]

Data Engineering: Getting Started with Delta LakeData Engineering: Getting Started with Delta Lake Briefly

Data Engineering: Getting Started with Delta Lake
Data Engineering: Getting Started with Delta Lake
Briefly