Vector embeddings allow searching aerial photos without using map data, interpreting text queries like 'roundabout' or 'tennis court' to provide relevant image results.
SkyCLIP model, trained on image-text pairs, creates 768-element vectors that can be used to find similarities between different objects in aerial photos, enabling accurate searches.
Collection
[
|
...
]