Recent advances in 2D diffusion and vision-language models have opened up new avenues for generating 3D assets, using 2D diffusion model priors for better performance.
Methods for text-to-3D and image-to-3D synthesis typically involve optimizing a 3D representation like NeRF or mesh, which are then used for 2D image generation.
Existing methods often struggle with inefficiency and multi-face issues, leading to long optimization times and geometries that lack sufficient 3D supervision and detail.
The recent development of SparseNeuS allows for faster 3D geometry production from images generated by models like zero123, though quality still remains a concern.
Collection
[
|
...
]