Building an IoT Monitoring System with Spark Structured Streaming, Kafka and Scala
Briefly

Developed a comprehensive IoT Smart Farm Monitoring system utilizing Scala, Apache Spark Structured Streaming, and Kafka. Real-time data processing for environmental parameters like CO2, temperature, humidity, and soil moisture.
Key components included Kafka as message broker, Spark Structured Streaming for real-time analytics, and Delta Lake for efficient data storage. Features like watermarks, error monitoring, and schema evolution were crucial.
Utilized Kafka for holding sensor data, Spark for processing with watermarks/windows for handling late data, and Delta Lake for ACID transactions and schema evolution.
Enriched sensor data by joining with zone data, improving data quality with contextual information like sensor location and zone type. Technologies used: Scala 2.13.14, Spark 3.5.1, Delta 3.2.0, Kafka 7.2.1, and Java 17.
Read at Medium
[
]
[
|
]