
"In a two-part blog series, Soam Acharya, Rainie Li, William Tom and Ang Zhang describe how the Pinterest Big Data Platform team considered alternatives for their next-generation massive-scale data processing platform as the limits of the existing Hadoop-based system, known internally as Monarch, became clear. They present Moka as the outcome of that search, and as their EKS based cloud native data processing platform, which now runs production workloads at Pinterest scale."
"It demonstrates an industry-wide shift in which big technology companies now treat Kubernetes as a control plane for data, rather than only as a stateless service platform. Encouraged by growing popularity and increasing adoption in the Big Data community, the team explored Kubernetes-based systems as the most likely replacement for Hadoop 2.x. Any candidate platform had to meet precise criteria around scalability, security, cost and the ability to host multiple processing engines."
Pinterest migrated core workloads from an ageing Hadoop-based platform called Monarch to Moka, a Kubernetes-based system on Amazon EKS. Moka uses Apache Spark as its primary processing engine and plans to support additional frameworks. The platform emphasizes scalability, security, cost efficiency, and the ability to host multiple processing engines. Kubernetes is treated as a control plane for data rather than only a stateless service runtime. Logging, metrics, and job history services were added so engineers can debug and tune large-scale Spark jobs. Moka runs production workloads at Pinterest scale while preserving Spark investments and modernizing infrastructure.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]