#aws-glue

[ follow ]
Data science
fromMedium
2 weeks ago

Migrating from Historical Batch Processing to Incremental CDC Using Apache Iceberg (Glue 4...

Use Apache Iceberg Copy-on-Write tables in AWS Glue 4 to migrate from full historical batch reprocessing to incremental CDC, reducing redundant computation, I/O, and costs.
fromInfoQ
1 month ago

Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale

In essence, Yelp now writes terabytes of daily access logs but converts them into compact, parquet-formatted archives that are easy to query with tools like Amazon Athena. Through a process of periodic "compaction," raw plaintext log objects are merged into fewer, larger Parquet files, reducing storage usage by about 85% and cutting the number of objects by more than 99.99%. This transformation makes analytics efficient and cost-effective, enabling quick lookups for permission debugging, cost attribution, incident investigation, and data retention analysis.
Software development
E-Commerce
fromInfoQ
6 months ago

Amazon S3 Adds Sort and Z-Order Compaction to Improve Apache Iceberg Query Performance

Amazon S3 now supports sort and z-order compaction for Apache Iceberg tables to improve query performance and reduce costs.
[ Load more ]