The Data Loader forms the foundation of the pipeline, connecting to a wide array of source systems. It implements granular filtering options based on file types, modification dates, and custom criteria. A key feature is its ability to extract and preserve metadata, such as access controls from source systems. The loader also supports incremental loading, efficiently handling large-scale data updates without the need for full reprocessing.
The Embeddings Generator captures the semantic meaning of your data into vector representations using selected embedding models. It supports multiple state-of-the-art APIs and hosted models, allowing organizations to choose the best fit for their data and use case. The generator implements efficient splitting strategies to handle long documents, ensuring that context is preserved while optimizing for vector database storage and retrieval.
The Vector Database stores and indexes these embeddings for efficient retrieval. Gencore AI integrates with popular vector databases and implements optimized indexing strategies for fast similarity search. A standout feature is its support for hybrid search, combining vector similarity, syntactic similarity, and metadata filtering to provide more accurate and contextually relevant results.
#data-processing #metadata-preservation #natural-language-processing #vector-database #data-compliance
Collection
[
|
...
]