Functional Elegance: Making Spark Applications Cleaner with the Cats Library
Briefly

From my perspective, this code is not readable because it mixes the process of dataset transformation with metadata counting - and worse, it's almost impossible to rewrite within a method.
Such code is difficult to maintain and reuse. If you want to read the MedCleanData dataset and store the same metadata elsewhere in the project, you will have to copy not only the code with the dataset transformations, but also the code for calculating the metadata separately.
This function should obviously be decomposed into several smaller functions, so that each separate function returns a pair from the dataset and the collected metadata.
Every small function transformer (readMedCleanDataset or addCustomColumns) can be reused in other parts of the project.
Read at Medium
[
add
]
[
|
|
]