Given the historical context, our vision was clear: to remove humans in the data discovery loop by automating the entire process using LLM-powered products. We aimed to reduce the time taken for data discovery from multiple days to mere seconds, eliminating the need for anyone to ask their colleagues data discovery questions ever again.
The team behind Hubble decided to invest heavily in improving the discovery process's efficacy. They started by enhancing ElasticSearch table metadata and improving the documentation coverage of data lake tables, which was low, at only 20%.
Engineers conducted user interviews to discover how ElasticSearch should be tuned. They hid irrelevant tables, deboosted deprecated tables, and boosted the most relevant schemas and certified tables.
Subsequently, they added relevant tags and improved the search UI, resulting in a 12% increase in the search click-through rate.
Collection
[
|
...
]