MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator

Abstract

World models that support controllable and editable spatiotemporal environments are valuable for robotics, enabling scalable training data, reproducible evaluation, and flexible task design. While recent text-to-video models generate realistic dynamics, they are constrained to 2D views and offer limited interaction. We introduce MorphoSim, a language-guided framework that generates 4D scenes with multi-view consistency and object-level controls. From natural language instructions, MorphoSim produces dynamic environments where objects can be directed, recolored, or removed, and scenes can be observed from arbitrary viewpoints. The framework integrates trajectory-guided generation with feature field distillation, allowing edits to be applied interactively without full re-generation. Experiments show that MorphoSim maintains high scene fidelity while enabling controllability and editability. The code is available at https://github.com/eric-ai-lab/Morph4D.

Figure 1: MorphoSim can do interative and controllable generation and editing.

Overview of MorphoSim

Figure 2: Overview of the MorphoSim pipeline with Command Parameterizer, Scene Generation, and Scene Editing modules.

Examples

Figure 3: MorphoSim can do color editing, object extraction, and removal.

Figure 4: MorphoSim can do motion control during generation.

BibTeX


      @article{he2025morphosim,
        title   = {MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator},
        author  = {He, Xuehai and Zhou, Shijie and Venkateswaran, Thivyanth and Zheng, Kaizhi and Wan, Ziyu and Kadambi, Achuta and Wang, Xin Eric},
        year    = {2025},
        journal = {arXiv preprint arXiv:2510.04390}
      }