When cache is used on an intermediary result dataframe, that result is brought to the memory most of the time... cached dataframe is used instead.
To fix this issue all I needed to do is add .cache() statement at the end of fillNullIds(sourceDf).
[
add
]
[
|
|
...
]