Why Data Lies (and Your Model Might Too): The Curious Case of Simpson's Paradox | HackerNoon
Briefly

The article explores Simpson's Paradox, illustrating how conditional and marginal probabilities can deliver conflicting narratives using a fictional university's admission data for men and women. While women appear to outperform men within departments, their overall admission rates suggest otherwise due to the aggregation of data across differing department choices. This paradox highlights the need for careful data interpretation, as seemingly clear trends can reverse when not considering underlying variables or subgroup dynamics.
The conditional probability P(Admit∣ Female, Dept) is higher than P(Admit∣ Male, Dept) in Department A, but that advantage gets wiped out when we aggregate everything.
Women mostly applied to Department B, where everyone had a low chance of admission. Men mostly applied to Department A, where acceptance rates were high.
This data isn't inaccurate, but it is misleading. Conditional probabilities don't always play nicely with marginal totals, showing how Simpson's Paradox works.
Aggregated data hides subgroup t, leading to misconceptions about performance when different groups have varying admission rates.
Read at Hackernoon
[
|
]