#data-skew

[ follow ]
fromHackernoon
2 years ago

Deep Dive into MS MARCO Web Search: Unpacking Dataset Characteristics | HackerNoon

The MS MARCO Web Search dataset presents a multilingual landscape, uncovering significant data skew that may impact model performance and necessitates data-centric optimization techniques for improvement.
Data science
fromMedium
1 month ago

Apache Spark: Fix data skew issue using salting technique (practical example)

Data skew in Apache Spark is a performance issue where a few keys dominate the data distribution, leading to uneven partitions and slow queries, especially during operations that require shuffling.
Data science
[ Load more ]