Lance takes aim at Parquet in file format joust
Briefly

Lance takes aim at Parquet in file format joust
"In 2022, we had our first Lance 0.01 release, we were widely seen as a little bit crazy for suggesting that there was a better alternative to Parquet. Certainly, the world has changed since then,"
"The core question that we wanted to ask is, what is the relationship between AI and data. Advanced agent techniques get spread out pretty fast. If you're an enterprise, what really differentiates your AI from that of your competitors is data,"
"The velocity is much faster, because now a lot of this data is just being generated by the model, and you're looking at hundreds of tokens per second of automatic data generation. Then there is variety: instead of just numbers and timestamps, now you have long text prompts, images, audio waves, and, the [vector] embeddings themselves,"
Lance is a fledgling file format created to complement and address limitations of Parquet for modern AI and machine learning needs. LanceDB supports and develops the format, which began releases in 2022. AI-driven workflows require faster inferencing, higher data velocity, and support for multimodal content such as long text prompts, images, audio, and vector embeddings. Parquet was not designed for these larger and more varied data types or the access patterns of rapid model-driven generation. Lance aims to optimize storage and access for AI/ML workloads and is under review for adoption by an open source foundation.
Read at Theregister
Unable to calculate read time
[
|
]