This paper introduces Uni-OVSeg, a novel weakly-supervised open-vocabulary segmentation framework that utilizes independent image-mask and image-text pairs to enhance performance.
By liberating the strict correspondence between masks and texts, this method allows for easy collection of data while addressing scalability issues in segmentation tasks.
The inherent challenges of noise in the relationships between masks and entities are tackled by leveraging large vision-language models for refining text descriptions.
#open-vocabulary-segmentation #weakly-supervised-learning #image-mask-text-pairs #large-vision-language-models #computer-vision
Collection
[
|
...
]