Introducing Apache Arrow Support in mssql-python - Microsoft for Python Developers Blog

The mssql-python library has introduced support for fetching SQL Server data directly as Apache Arrow structures, enhancing performance and memory efficiency. This advancement eliminates the need for creating numerous Python objects and reduces garbage collection allocations. Apache Arrow facilitates zero-copy language interoperability through a stable shared-memory layout, allowing different programming languages to exchange data without serialization. The columnar in-memory format of Arrow stores values for each column contiguously, optimizing data handling for database drivers and DataFrame libraries.

"Fetching a million rows from SQL Server into a Polars DataFrame used to mean a million Python objects, a million GC allocations, and then throwing it all away to build a DataFrame. Not anymore."

"The key insight behind Apache Arrow is zero-copy language interoperability. Arrow defines a stable shared-memory layout - the Arrow C Data Interface, a cross-language ABI - that any language can produce or consume by exchanging a pointer."

"For a database driver, this means the entire fetch loop can run in C++ and write values directly into Arrow buffers - no Python object creation per row, no garbage-collector pressure."

#sql-server #apache-arrow #dataframe #polars #memory-efficiency

Read at Microsoft for Python Developers Blog

Unable to calculate read time

Collection

[

...

]

Introducing Apache Arrow Support in mssql-python - Microsoft for Python Developers BlogIntroducing Apache Arrow Support in mssql-python - Microsoft for Python Developers Blog Briefly

Introducing Apache Arrow Support in mssql-python - Microsoft for Python Developers Blog
Introducing Apache Arrow Support in mssql-python - Microsoft for Python Developers Blog
Briefly