#data-partitioning

[ follow ]
Data science
fromMedium
3 months ago

Apache Spark: Fix data skew issue using salting technique (practical example)

Data skew leads to performance issues in Spark when certain keys dominate the distribution during shuffles.
Salting can effectively reduce data skew by distributing heavy keys across multiple partitions.
[ Load more ]