SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering



1KAIST 2EverEx 3Yonsei University

(* : Equal Contribution, † : Corresponding Author)

SteerX seamlessly integrates video generative models and scene reconstruction models, enabling any 3D/4D scene generation without camera conditions.

Abstract

Recent progress in 3D/4D scene generation emphasizes the importance of physical alignment throughout video generation and scene reconstruction. However, existing methods improve the alignment separately at each stage, making it difficult to manage subtle misalignments arising from another stage. Here, we present SteerX, a zero-shot inference-time steering method that unifies scene reconstruction into the generation process, tilting data distributions toward better geometric alignment. To this end, we introduce two geometric reward functions for 3D/4D scene generation by using pose-free feed-forward scene reconstruction models. Through extensive experiments, we demonstrate the effectiveness of SteerX in improving 3D/4D scene generation.

Geometric Rewards

Our reward functions assess the geometric consistency of intermediate generated video frames by computing the feature similarity of upscaled DINO features. (a) GS-MEt3R evaluates feature similarity between the original video frames and their corresponding rendered images from 3DGS. (b) Dyn-MEt3R unprojects background features from half of the video frames and reprojecting them onto the remaining frames to compute feature similarity.

Geometric Steering

Our geometric steering builds on Feymann-Kac Steering which resamples particles to guide the data distribution toward high reward samples. By incorporating geometric rewards, we iteratively tilt the data distribution to generate geometrically aligned samples using any video generative model.