Why data contracts need Apache Kafka and Apache Flink

"Imagine it's 3 a.m. and your pager goes off. A downstream service is failing, and after an hour of debugging you trace the issue to a tiny, undocumented schema change made by an upstream team. The fix is simple, but it comes with a high cost in lost sleep and operational downtime. This is the nature of many modern data pipelines. We've mastered the art of building distributed systems, but we've neglected a critical part of the system: the agreement on the data itself."

"Data pipelines are a popular tool for sharing data from different producers (databases, applications, logs, microservices, etc.) to consumers to drive event-driven applications or enable further processing and analytics. These pipelines have often been developed in an ad hoc manner, without a formal specification for the data being produced and without direct input from the consumer on what data they expect. As a result, it's not uncommon for upstream producers to introduce ad hoc changes consumers don't expect and can't process."

"Data contract design requires data producers and consumers to collaborate early in the software design life cycle to define and refine requirements. Explicitly defining and documenting requirements early on simplifies pipeline design and reduces or removes errors in consumers caused by data changes not defined in the contract. Data contracts are an agreement between data producers and consumers that define schemas, data types, and data quality constraints for data shared between them."

Undocumented schema changes by upstream teams often cause downstream failures, operational downtime, and costly debugging. Data pipelines share data from diverse producers to consumers but are frequently built ad hoc without formal specifications or consumer input. Data contracts require early collaboration between producers and consumers to define schemas, types, and quality constraints, reducing unexpected changes and consumer errors. Explicitly documenting requirements simplifies pipeline design and minimizes operational risk. Enforcing contracts requires tooling and platform capabilities; technologies like Kafka and Flink provide key features to transport, validate, and enforce data contracts across distributed pipelines.

#data-contracts #schema-management #kafka #flink #data-pipelines

Read at InfoWorld

Unable to calculate read time

Collection

[

...

]

Why data contracts need Apache Kafka and Apache FlinkWhy data contracts need Apache Kafka and Apache Flink Briefly

Why data contracts need Apache Kafka and Apache Flink
Why data contracts need Apache Kafka and Apache Flink
Briefly