Enhancing A/B Testing at DoorDash with Multi-Armed Bandits
Briefly

"While experimentation is essential, traditional A/B testing can be excessively slow and expensive, according to DoorDash engineers Caixia Huang and Alex Weinstein. To address these limitations, they adopted a "multi-armed bandits" (MAB) approach to optimize their experiments. When running experiments, organizations aim to minimize the opportunity cost, or regret, caused by serving the less effective variants to a subset of the user base."
"For our purposes, this strategy allocates experimental traffic toward better-performing variants based on ongoing feedback collected during the experiment. The core idea is that an automated MAB agent continuously selects from a pool of actions, or arms, to maximize a defined reward, while simultaneously learning from user feedback in subsequent iterations. This strategy enables a balance between exploration, i.e., learning about all candidate options, and exploitation, i.e., prioritizing the best‑performing options as they emerge, until the experiment converges on the best option."
Traditional A/B testing uses fixed traffic splits and predetermined sample sizes that remain unchanged throughout experiments, causing continued exposure to inferior variants even after a clear winner emerges. Opportunity cost, or regret, compounds as the number of concurrent experiments increases, incentivizing sequential runs and slowing iteration. Multi-armed bandit (MAB) methods adaptively allocate traffic toward better-performing variants based on ongoing feedback. An automated MAB agent repeatedly selects among actions to maximize a defined reward while learning from user feedback, balancing exploration and exploitation until the experiment converges on the best option, reducing waste and accelerating learning.
Read at InfoQ
Unable to calculate read time
[
|
]