TY - GEN
T1 - Auto3d: Novel view synthesis through unsupervisely learned variational viewpoint and global 3d representation
AU - Liu, XiaoFeng
AU - Che, Tong
AU - Lu, Yiqun
AU - Yang, Chao
AU - Li, Site
AU - You, Jia
PY - 2020
Y1 - 2020
N2 - This paper targets on learning-based novel view synthesis from a single or limited 2D images without the pose supervision. In the viewer-centered coordinates, we construct an end-to-end trainable conditional variational framework to disentangle the unsupervisely learned relative-pose/rotation and implicit global 3D representation (shape, texture and the origin of viewer-centered coordinates, etc.). The global appearance of the 3D object is given by several appearance-describing images taken from any number of viewpoints. Our spatial correlation module extracts a global 3D representation from the appearance-describing images in a permutation invariant manner. Our system can achieve implicitly 3D understanding without explicitly 3D reconstruction. With an unsupervisely learned viewer-centered relative-pose/rotation code, the decoder can hallucinate the novel view continuously by sampling the relative-pose in a prior distribution. In various applications, we demonstrate that our model can achieve comparable or even better results than pose/3D model-supervised learning-based novel view synthesis (NVS) methods with any number of input views.
AB - This paper targets on learning-based novel view synthesis from a single or limited 2D images without the pose supervision. In the viewer-centered coordinates, we construct an end-to-end trainable conditional variational framework to disentangle the unsupervisely learned relative-pose/rotation and implicit global 3D representation (shape, texture and the origin of viewer-centered coordinates, etc.). The global appearance of the 3D object is given by several appearance-describing images taken from any number of viewpoints. Our spatial correlation module extracts a global 3D representation from the appearance-describing images in a permutation invariant manner. Our system can achieve implicitly 3D understanding without explicitly 3D reconstruction. With an unsupervisely learned viewer-centered relative-pose/rotation code, the decoder can hallucinate the novel view continuously by sampling the relative-pose in a prior distribution. In various applications, we demonstrate that our model can achieve comparable or even better results than pose/3D model-supervised learning-based novel view synthesis (NVS) methods with any number of input views.
U2 - 10.1007/978-3-030-58545-7_4)
DO - 10.1007/978-3-030-58545-7_4)
M3 - Conference article published in proceeding or book
T3 - Proc. 16th European Conference on Computer Vision (ECCV2020), Glasgow, UK, August 23–28, 2020, Part IX 16 (pp. 52-71)
SP - 1
EP - 19
BT - Proc. 16th European Conference on Computer Vision (ECCV2020), Glasgow, UK, August 23–28, 2020, Part IX 16 (pp. 52-71)
PB - Springer
ER -