Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding,
Zhenfang Chen,
Tao Du,
Ping Luo,
Joshua B. Tenenbaum, and
Chuang Gan
More details can be found at the Project Page.
If you find our work useful in your research please consider citing our paper:
@inproceedings{ding2021dynamic,
author = {Ding, Mingyu and Chen, Zhenfang and Du, Tao and Luo, Ping and Tenenbaum, Joshua B and Gan, Chuang},
title = {Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language},
booktitle = {Advances In Neural Information Processing Systems},
year = {2021}
}
- Python 3
- PyTorch 1.3 or higher
- All relative packages are covered by Miniconda
- Both CPUs and GPUs are supported
-
Download videos, video annotation, questions and answers, and object proposals accordingly from the official website
-
Transform videos into ".png" frames with ffmpeg.
-
Organize the data as shown below.
clevrer ├── annotation_00000-01000 │ ├── annotation_00000.json │ ├── annotation_00001.json │ └── ... ├── ... ├── image_00000-01000 │ │ ├── 1.png │ │ ├── 2.png │ │ └── ... │ └── ... ├── ... ├── questions │ ├── train.json │ ├── validation.json │ └── test.json ├── proposals │ ├── proposal_00000.json │ ├── proposal_00001.json │ └── ...
-
We also provide data for physics learning and program execution in Google Drive. You can download them optionally and put them in the
./data/
folder. -
Download the processed data executor_data.zip for the executor. Put it in and unzip it to
./executor/data/
.
Download the object proposals from the region proposal network and follow the Step-by-step Training
in DCL to get object concepts and trajectories.
The above process includes:
- trajectory extraction
- concept learning
- trajectory refinement
Or you can download our extracted object dictionaries object_dicts.zip directly from Google Drive.
After we get the above object dictionaries, we learn physical parameters from object properties and trajectories.
cd dynamics/
python3 learn_dynamics.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.
The output object physical parameters object_dicts_with_physics.zip can be downloaded from Google Drive.
Physical simulation using learned physical parameters.
cd dynamics/
python3 physics_simulation.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.
The output simulated trajectories/events object_simulated.zip can be downloaded from Google Drive.
Correction of long-range prediction according to video observations.
cd dynamics/
python3 refine_prediction.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.
The output refined trajectories/events object_updated_results.zip can be downloaded from Google Drive.
After we get the final trajectories/events, we perform the neuro-symbolic execution and evaluate the performance on the validation set.
cd executor/
python3 evaluation.py
The test json file for evaluation on evalAI can be generated by
cd executor/
python3 get_results.py
- Download causal_mass.zip and counterfactual_mass.zip from Google Drive.
- Generate counterfactual data on the collision event by
python3 counterfactual_mass/generate_data.py
For questions regarding VRDP, feel free to post here or directly contact the author (mingyuding@hku.hk).