#data-pipeline

[ follow ]
fromHackernoon
4 months ago

Partitioning Large Messages and Normalizing Workloads Can Boost Your AWS CloudWatch Ingestion | HackerNoon

Architecture choices significantly impact performance in data ingestion systems.
Scala
frommedium.com
3 months ago

How I Made My Apache Spark Jobs Schema-Agnostic ( Part-2 )

Dynamic transformations enable flexible schema adaptations without code changes.
Using schema metadata simplifies column management, renaming, and casting.
frommedium.com
4 months ago

Spark Scala Exercise 24: Error Handling and Logging in SparkBuild Safe, Auditable ETL Pipelines

Build a defensive Spark ETL pipeline to ensure robust data processing.
Handle data issues like schema mismatches and corrupt records effectively.
Implement custom logging and audit trails for better failure management.
[ Load more ]