#spark

[ follow ]
fromMedium
1 month ago

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

"The job didn't fail. It just... never finished." That was the worst part. No errors.No stack traces.Just a Spark job running forever in production - blocking downstream pipelines, delaying reports, and waking up-on-call engineers at 2 AM. This is the story of how I diagnosed a real Spark performance issue in production and fixed it drastically, not by adding more machines - but by understanding Spark properly.
Data science
fromMedium
1 month ago

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

A Spark job slowed roughly 10x after data growth; diagnosing and optimizing Spark execution reduced runtime by about 70% without adding cluster resources.
Cryptocurrency
fromBitcoin Magazine
1 month ago

Spark Explained Like You're Five

Spark enables off-chain bitcoin transfers by changing who can jointly authorize spending between a user and a Spark Entity without on-chain movement.
Environment
fromMedium
2 months ago

Spark Project: Exploring and Forecasting Urban Pollution

A comprehensive data-cleaning and feature-engineering pipeline prepares raw pollution data for accurate urban pollution forecasting.
Software development
fromMedium
3 months ago

Instrumenting Scala Spark Applications with OpenTelemetry: A Practical Guide

Manual OpenTelemetry tracing in Scala Spark provides complete, business-context-rich observability across drivers and executors for distributed data pipelines.
#scala
fromMedium
3 months ago
Software development

Map vs FlatMap in Spark with Scala: What Every Data Engineer Should Know

fromMedium
3 months ago
Software development

Map vs FlatMap in Spark with Scala: What Every Data Engineer Should Know

fromMedium
3 months ago
Software development

Map vs FlatMap in Spark with Scala: What Every Data Engineer Should Know

fromMedium
3 months ago
Software development

Map vs FlatMap in Spark with Scala: What Every Data Engineer Should Know

fromMedium
3 months ago
Software development

Map vs FlatMap in Spark with Scala: What Every Data Engineer Should Know

fromMedium
3 months ago
Software development

Map vs FlatMap in Spark with Scala: What Every Data Engineer Should Know

Software development
fromTheregister
3 months ago

Ironclad OS crafts Unix-like kernel in Ada and SPARK

Ironclad builds a POSIX-compatible, realtime-capable Unix-like kernel in Ada/SPARK with MAC and aims for formal verification and an accompanying OS, Gloire.
#data-engineering
fromMedium
5 months ago

Exploring Kubeflow: Part 3

Working with Amazon S3 buckets in the Kubeflow Spark Operator and Python is complicated, with issues surrounding dependency management and file access within worker pods.
Software development
Software development
fromZDNET
6 months ago

GitHub's AI-powered Spark lets you build apps using natural language - here's how to access it

GitHub's Spark app-building platform offers AI-driven design and launch capabilities for micro apps through natural language prompts.
Scala
fromMedium
8 months ago

Time-Traveling Through Spark: Recording Distributed Failures Across Space and Time

Time-travel debugging in distributed Spark applications on Kubernetes allows for precise bug tracking by recording driver and executor executions.
frommedium.com
8 months ago

Day 4Identifying Top 3 Selling Products per Category | Spark Interview Question.

To identify the top-selling products in each category, begin by grouping the sales data by category and summing the total units sold for each product in that category.
Cryptocurrency
fromBitcoin Magazine
8 months ago

Magic Eden Partners With Spark To Bring Fast, Cheap Bitcoin Settlements

Magic Eden integrates with Spark to revolutionize Bitcoin trading by improving transaction speed and minimizing fees.
frommedium.com
9 months ago

How I Made My Apache Spark Jobs Schema-Agnostic ( Part-2 )

Dynamic column transformations enable us to define rules within the schema, allowing Spark jobs to adapt without hardcoding changes, simplifying the data pipeline process.
Scala
Data science
fromawstip.com
10 months ago

Spark Scala Exercise 23: Working with Delta Lake in Spark ScalaACID, Time Travel, and Upserts

Delta Lake enhances data reliability and governance for data lakes by integrating warehouse features.
Data science
fromawstip.com
10 months ago

Spark Scala Exercise 22: Custom Partitioning in Spark RDDsLoad Balancing and Shuffle

Implementing a custom partitioner in Spark helps manage load balance and optimize data distribution.
Scala
fromawstip.com
10 months ago

Spark Scala Exercise 20: Structured Streaming with ScalaReal-Time Data from Socket or Kafka to

Spark Structured Streaming processes real-time data continuously, enabling real-time analytics on unbounded streams.
[ Load more ]