这是用户在 2025-3-25 18:59 为 https://drivedreamer.github.io/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

DriveDreamer: Towards Real-world-driven
World Models for Autonomous Driving

GigaAI1,  Tsinghua University2 
*Equal Contribution

DriveDreamer excels in controllable driving video generation, aligning seamlessly with text prompts and structured traffic constraints. DriveDreamer can also interact with the driving scene and predict different future driving videos, based on input driving actions. Furthermore, DriveDreamer extends its utility to anticipate future driving actions.
DriveDreamer 擅长可控驾驶视频生成,与文本提示和结构化交通约束无缝对齐。DriveDreamer 还可以与驾驶场景进行交互,预测未来不同的驾驶视频,基于输入驾驶动作。此外,DriveDreamer 扩展了其实用性,以预测未来的驾驶行为。

Abstract

World models, especially in autonomous driving, are trending and drawing extensive attention due to its capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. The proposed DriveDreamer is the first world model established from real-world driving scenarios. We instantiate DriveDreamer on the challenging nuScenes benchmark, and extensive experiments verify that DriveDreamer empowers precise, controllable video generation that faithfully captures the structural constraints of real-world traffic scenarios. Additionally, DriveDreamer enables the generation of realistic and reasonable driving policies, opening avenues for interaction and practical applications.
世界模型,尤其是自动驾驶模型,由于其理解驾驶环境的容量,正在成为趋势并吸引了广泛的注意力。已建立的世界模型在生成高质量驾驶视频和安全操纵的驾驶政策方面具有巨大的潜力。然而,相关研究的一个关键限制在于其主要关注游戏环境或模拟设置,从而缺乏真实世界驾驶场景的表示。因此,我们推出了 DriveDreamer,这是一款完全源自真实驾驶场景的开创性世界模型。考虑到在复杂的驾驶场景中建模世界需要巨大的搜索空间,我们建议利用强大的扩散模型来构建复杂环境的综合表示。此外,我们介绍了一个两级训练流水线。在初始阶段,DriveDreamer 深入了解结构化交通约束,而后续阶段使其具备预测未来状态的能力。拟议中的 DriveDreamer 是第一个根据真实驾驶场景建立的世界模型。我们在具有挑战性的 nuScenes 基准上实例化了 DriveDreamer,大量实验验证了 DriveDreamer 能够实现精确、可控的视频生成,忠实地捕捉现实世界交通场景的结构约束。此外,DriveDreamer 能够生成现实合理的驾驶策略,为交互和实际应用开辟了途径。

Method

The DriveDreamer framework begins with an initial reference frame and its corresponding road structural information (i.e., HDMap and 3D box). Within this context, DriveDreamer leverages the proposed ActionFormer to predict forthcoming road structural features in the latent space. These predicted features serve as conditions and are provided to Auto-DM, which generates future driving videos. Simultaneously, the utilization of text prompts allows for dynamic adjustments to the driving scenario style (e.g., weather and time of the day). Moreover, DriveDreamer incorporates historical action information and the multi-scale latent features extracted from Auto-DM, which are combined to generate reasonable future driving actions. In essence, DriveDreamer offers a comprehensive framework that seamlessly integrates multi-modal inputs to generate future driving videos and driving policies, thereby advancing the capabilities of autonomous-driving systems.
DriveDreamer 框架从初始参考系及其对应的道路结构信息(即 HDMap 和 3D box)开始。在这种情况下,DriveDreamer 利用提议的 ActionFormer 来预测潜在空间中即将到来的道路结构特征。这些预测的特征作为条件提供给 Auto-DM,后者生成未来的驾驶视频。同时,文本提示的利用允许对驾驶场景风格(例如,天气和一天中的时间)进行动态调整。而且,DriveDreamer 融入了历史动作信息和从 Auto-DM 中提取的多尺度潜在特征,结合起来生成合理的未来驾驶动作。本质上,DriveDreamer 提供了一个全面的框架,可以无缝集成多模态输入,以生成未来的驾驶视频和驾驶策略,从而提高自动驾驶系统的能力。

Results

1. Diverse Driving Video Generation.


2. Driving Video Generation with Traffic Condition and Different Text Prompts (Sunny, Rainy, Night)
2.具有路况和不同文字提示(晴天、雨天、夜晚)的驾驶视频生成


3. Future Driving Video Generation with Action Interaction.
3、动作交互的未来驾驶视频生成。


4. Future Driving Action Generation.
4.未来驾驶动作生成。

BibTeX

If you use our work in your research, please cite:

@article{wang2023drive,
  title={DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving},
  author={Wang, Xiaofeng and Zhu, Zheng and Huang, Guan and Chen, Xinze and Zhu, Jiagang and Lu, Jiwen},
  journal={arXiv preprint arXiv:2309.09777},
  year={2023}
}