ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding

ReDirector generates camera-controlled video retakes for dynamically captured variable-length videos.

Abstract

We present ReDirector, a novel camera-controlled video retake generation method for dynamically captured variable-length videos. In particular, we rectify a common misuse of RoPE in previous works by aligning the spatiotemporal positions of the input video and the target retake. Moreover, we introduce Rotary Camera Encoding (RoCE), a camera-conditioned RoPE phase shift that captures and integrates multi-view relationships within and across the input and target videos. By integrating camera conditions into RoPE, our method generalizes to out-of-distribution camera trajectories and video lengths, yielding improved dynamic object localization and static background preservation. Extensive experiments further demonstrate significant improvements in camera controllability, geometric consistency, and video quality across various trajectories and lengths.

Architecture Overview

ReDirector is fine-tuned on Wan-I2V-1.3B-CamCtrl. Our goal is to generate video retakes conditioned on the target camera trajectories, the input video, and its camera poses. We introduce Rotary Camera Encoding (RoCE) into the self-attention layers, using their outputs as camera-conditioned RoPE phase shifts that provide physically grounded positional information. Finally, we employ geometry-aware attention by applying complementary phase shifts before and after value aggregation.

ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding

Abstract

Architecture Overview

Reported Results

Video Comparisons (DAVIS)

More Video Results