Estimating the 3D shape of an object from a single image is a complex inverse problem, primarily requiring prior knowledge about object geometry for occluded portions.
Regression-based methods explore various 3D shape representations, such as meshes and point clouds, but face limitations in generalization beyond their training categories.
Decomposing the shape estimation problem into depth prediction and complete shape estimation has enabled better zero-shot generalization by representing 3D in a viewer-centered frame.
Advancements have been made in encoder/decoder architectures by using local features from 2D maps, resulting in improved detail and generalization to unseen objects.
Collection
[
|
...
]