How to Use Apache Spark to Craft a Multi-Year Data Regression Testing and Simulations Framework
Briefly

How to Use Apache Spark to Craft a Multi-Year Data Regression Testing and Simulations Framework
"Absolutely. Yes, my story is unusual, but I didn't touch computers until I enrolled in a computer science course at the university. I grew up in a village, and I didn't have access to any computers or similar technology. I did well in school, and when it was time to get into the university, I just asked around, 'Hey, what should I get into?' And thankfully, people suggested computer science, and I have been enjoying this field ever since."
"Yes, absolutely. This is a common scenario that every large-scale system encounters repeatedly, namely, migrations. We write a system after a few years, but the system does not scale. The inputs and outputs are generally the same. Still, the internals of the systems just need to be rewritten because the business logic keeps getting more complicated, and then we have to"
An engineering manager at Stripe with eight years of experience focused on billing systems described using Apache Spark in an unconventional way to address large-scale migration challenges. Migrations commonly require rewriting system internals while preserving inputs and outputs because growing business logic makes prior implementations unscalable. The approach targeted planetary-scale, multi-year data regression testing to validate behavioral equivalence across rewrites. The engineer's personal background includes discovering computer science at university after growing up without access to computers, which influenced a practical, problem-driven perspective on engineering and system evolution.
Read at InfoQ
Unable to calculate read time
[
|
]