Lyft has rearchitected its machine learning platform LyftLearn into a hybrid system, moving offline workloads to AWS SageMaker while retaining Kubernetes for online model serving. Its decision to choose managed services where operational complexity was highest, while maintaining custom infrastructure where control mattered most, offers a pragmatic alternative to unified platform strategies. Lyft's engineers migrated LyftLearn Compute, which manages training and batch processing, to AWS SageMaker, eliminating background watcher services, cluster autoscaling challenges, and eventually-consistent state management, which had consumed significant engineering effort.
In 2025, MSPs are grappling with widespread burnout, operational complexity, and tool sprawl, yet many plan to increase their budgets amid these challenges.