Collect awesome papers about establishing a world model with the help of generative abilities, like text, image, and video.
Feel free to request PR, issue, or e-mail me if you want to add more missing papers!πππ
If this repo is useful for your research, please consider starββ or share with others~~πππ
Still collecting~
paper | Affiliation | Other useful link | |
---|---|---|---|
7 Mar 2023 | Foundation models for decision making: Problems, methods, and opportunities | Google Research, UCB, MIT, University of Alberta | paper |
27 Feb 2024 | Video as the New Language for Real-World Decision Making | DeepMind, MIT, UCB | paper |
6 May 2024 | Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond | GigaAI, CAS NUS, Shanghai AI Lab |
paper GitHub |
15 Nov 2023 | Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models | KAIST, Rutgers University, EPFL, DeepMind | paper Project page GitHub |
7 Jun 2024 | Towards Generalist Robot Learning from Internet Video: A Survey | UCL, Weco AI, MIT | paper |
paper | Affiliation | Other useful link | |
---|---|---|---|
12 Oct 2023 | Learning to Act from Actionless Videos through Dense Correspondences | National Taiwan University, MIT | paper Project page GitHub |
16 Oct 2023 | Video Language Planning | DeepMind, MIT, UCB | paper Project page GitHub |
27 Oct 2023 | Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning | THU | paper GitHub |
20 Nov 2023 | Learning Universal Policies via Text-Guided Video Generation | MIT, DeepMind, UCB | paper Project page GitHub |
27 Nov 2023 | Drivedreamer: Towards real-world-driven world models for autonomous driving | GigaAI, THU | paper Project page |
13 Jan 2024 | Learning Interactive Real-World Simulators | UCB, MIT, DeepMind, University of Alberta | paper Project page |
18 Jan 2024 | WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens | GigaAI, THU | paper Project page |
16 Feb 2024 | Using Left and Right Brains Together: Towards Vision and Language Planning | SUSTC, MSRA, HKUST, XJTU, CityU | paper |
5 Mar 2024 | Why Not Use Your Textbook Knowledge-Enhanced Procedure Planning of Instructional Videos | MBZUAI, NEC, University of Auckland, Weizmann Institute of Science | paper GitHub |
1 Apr 2024 | Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion | Waabi, University of Toronto | paper Project page |
11 Apr 2024 | Drivedreamer-2: Llm-enhanced world models for diverse driving video generation | GigaAI, CASIA | paper Project page |
18 Apr 2024 | RoboDreamer: Learning Compositional World Models for Robot Imagination | HKUST, MIT, UCB | paper Project page GitHub |
May 23 2024 | Pandora: Towards General World Model with Natural Language Actions and Video States | Maitrix, UCSD, MBZUAI | paper Project page GitHub |
2 Jun 2024 | iVideoGPT: Interactive VideoGPTs are Scalable World Models | THU, Huawei, Tianjin | paper Project page GitHub |