HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

CVPR 2024

Sangmin Woo*, Byeongjun Park*, Hyojun Go, Jin-Young Kim, Changick Kim

KAIST EE Twelve Labs

*Equal Contribution Corresponding Author

🔥[NEW!] HarmonyView is a simple yet effective diffusion sampling technique adept at decomposing two intricate aspects in single-image 3D generation: consistency and diversity.

Summary

Recent progress in single-image 3D generation highlights the importance of multi-view coherency, leveraging 3D priors from large-scale diffusion models pretrained on Internet-scale images. However, the aspect of novel-view diversity remains underexplored within the research landscape due to the ambiguity in converting a 2D image into 3D content, where numerous potential shapes can emerge. Here, we aim to address this research gap by simultaneously addressing both consistency and diversity.

  1. HarmonyView. We present a novel diffusion sampling technique for multi-view diffusion model, demonstrating a win-win scenario in both consistency and diversity.
  2. Consistency-Diversity (CD) Score. We introduce a novel reference-free evaluation metric to assess the consistency-diversity based on both the CLIP image and text encoders.

One-Image-to-3D

HarmonyView generates realistic 3D content using just a single image. It excels at maintaining visual and geometric consistency across generated views while enhancing the diversity of novel views, even in complex scenes.

More Results (with Diversity)

HarmonyView synergistically guide the synchronization of noisy multi-views facilitating geometric coherency among clean multi-views. Thus, HarmonyView generate diverse instances with different random seeds.


Quantitative Comparison

HarmonyView outperforms state-of-the-art methods across all metrics in both novel view synthesis task and 3D reconstruction task. Notably, HarmonyView achieves the best results by a significant margin in 3D reconstruction task.

BibTeX


@misc{woo2023harmonyview,
      title={HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D},
      author={Sangmin Woo and Byeongjun Park and Hyojun Go and Jin-Young Kim and Changick Kim},
      year={2023},
      eprint={2312.15980},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
  

Acknowledgement

This website is adapted from Nerfies, and LLaVA, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.