Real-time Entity Resolution with Kafka and Spark
Briefly

Entity Resolution (ER) aims to identify which records in a single dataset or across multiple datasets refer to the same real-world entity so that these records can be aggregated and better understood.
ER techniques can help solve complicated and costly problems like data corruption, unnecessarily duplicated records, and most importantly, discordant datasets within the same overarching system.
This project is focused on resolving entities not only within a static dataset but also within the continuous influx of incoming streaming data.
Read at Medium
[
]
[
|
]