V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

Jianqi ChenBiao ZhangXiangjun TangPeter Wonka

King Abdullah University of Science and Technology (KAUST)

4D Reconstructed Mesh Interactive View

(For the best experience, please view this page on a computer.)

Abstract

We present V2M4, a novel 4D reconstruction method that directly generates a usable 4D mesh animation asset from a single monocular video. Unlike existing approaches that rely on priors from multi-view image and video generation models, our method is based on native 3D mesh generation models. Naively applying 3D mesh generation models to generate a mesh for each frame in a 4D task can lead to issues such as incorrect mesh poses, misalignment of mesh appearance, and inconsistencies in mesh geometry and texture maps. To address these problems, we propose a structured workflow that includes camera search and mesh reposing, condition embedding optimization for mesh appearance refinement, pairwise mesh registration for topology consistency, and global texture map optimization for texture consistency. Our method outputs high-quality 4D animated assets that are compatible with mainstream graphics and game software. Experimental results across a variety of animation types and motion amplitudes demonstrate the generalization and effectiveness of our method.

Method Overview

The Workflow of V2M4. Upon receiving a video sequence \( V_{\text{ref}} \), we generate coarse 3D meshes for each frame, denoted as \(\widetilde{\mathcal{M}}_{\{0, \dots, t\}}\). These initial meshes do not accurately capture the object's movement and appearance as depicted in the input video, and they exhibit inconsistencies in topology and texture. Our method employs a five-stage process: (1) Repose the mesh to accurately reflect object movement; (2) Refine the mesh appearance based on the reference video frames; (3) Ensure inter-mesh topology consistency through mesh registration; (4) Optimize a globally shared texture map across all meshes; (5) Keyframe the meshes, perform interpolation, and convert them into a directly usable 4D animation asset.

Gallery

(Switch "Gallery" Button Below to See More Visual Comparisons)

Citation


    @article{chen2025v2m44dmeshanimation,
        title={V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video},
        author={Chen, Jianqi and Zhang, Biao and Tang, Xiangjun and Wonka, Peter},
        journal={arXiv preprint arXiv:2503.09631},
        year={2025}
    }