The code is adapted from E.T. and most training as well as data processing files are in currently in the ET/notebooks
folder and the et_train
folder.
Inherited from the E.T. repo, the package is depending on:
- numpy
- pandas
- opencv-python
- tqdm
- vocab
- revtok
- numpy
- Pillow
- sacred
- etaprogress
- scikit-video
- lmdb
- gtimer
- filelock
- networkx
- termcolor
- torch==1.7.1
- torchvision==0.8.2
- tensorboardX==1.8
- ai2thor==2.1.0
- E.T. (https://github.com/alexpashevich/E.T.)
To fine-tune the MaskRCNN module used in solving the Alfred challenge, we provide the code adapted from the official PyTorch tutorial.
We assume the environment and the code structure as in the E.T. model is set up, with this repo served as an extension. Although the fine-tuning code should be a standalone unit.
Given a traj_data.json
file (e.g., the 45K one used in E.T. joint-training here), run python -m alfred.gen.render_trajs
as in E.T. to render the training inputs (raw images) and the ground truth labels (instance segmentation masks) for all the frames recorded in the traj_data.json
files.
Make sure the flag for generating instance level segmentation masks is set to True.
The rendered instance segmentation masks need to be preprocessed so that the data format is aligned with the one used in the official PyTorch tutorial.
In specific, each generated mask is of a different RGB color per instance, which is mapped to the unique instance index in the frame as well as a label index for its semantic class.
The mapping is constructed by looking up the traj['scene']['color_to_object_type']
in each of the json dictionaries.
The code also supports the functionality to only collect training data from certain subgoal data (such as for PickupObject in Alfred).
Notice that there are some bugs in the rendering process of the masks which creates some artifacts (small regions in the ground truth labels that correspond to no actual objects).
This can be fixed by only selecting instance masks that are larger than certain area (e.g., > 10 as in alfred/data/maskrcnn.py
).
Run python -m alfred.maskrcnn.train
which first loads the pre-trained model provided by E.T. and then fine-tunes it on the pre-processed data mentioned above.
We follow the MSCOCO evaluation protocal which is widely used for object detection and instance segmentation, which output average precision and recall at multiple scales.
The evaluation function call evaluate(model, data_loader_test, device=device)
in alfred/maskrcnn/train.py
serves as an example.