
SteerX seamlessly integrates video generative models and scene reconstruction models, enabling any 3D/4D scene generation without camera conditions.
SteerX seamlessly integrates video generative models and scene reconstruction models, enabling any 3D/4D scene generation without camera conditions.
Recent progress in 3D/4D scene generation emphasizes the importance of physical alignment throughout video generation and scene reconstruction. However, existing methods improve the alignment separately at each stage, making it difficult to manage subtle misalignments arising from another stage. Here, we present SteerX, a zero-shot inference-time steering method that unifies scene reconstruction into the generation process, tilting data distributions toward better geometric alignment. To this end, we introduce two geometric reward functions for 3D/4D scene generation by using pose-free feed-forward scene reconstruction models. Through extensive experiments, we demonstrate the effectiveness of SteerX in improving 3D/4D scene generation.
Our reward functions assess the geometric consistency of intermediate generated video frames by computing the feature similarity of upscaled DINO features. (a) GS-MEt3R evaluates feature similarity between the original video frames and their corresponding rendered images from 3DGS. (b) Dyn-MEt3R unprojects background features from half of the video frames and reprojecting them onto the remaining frames to compute feature similarity.
Our geometric steering builds on Feymann-Kac Steering which resamples particles to guide the data distribution toward high reward samples. By incorporating geometric rewards, we iteratively tilt the data distribution to generate geometrically aligned samples using any video generative model.