We identify our YouTube dataset as a rich source of tasks, as many human players demonstrate and narrate creative missions in the tutorial playlists. To collect high-quality tasks and accompanying videos, we design a pipeline that makes it easy to find and annotate interesting tasks (see our Arxiv paper for details). We design a task curation UI using Label Studio that displays the full video and YouTube description. A human annotator can choose to reject the video, adjust the timestamps, select the title, or refine the description to be the task goal. Through this pipeline, we extract more than 1,000 tasks from the common wisdom of a huge number of veteran Minecraft gamers. Some examples are "make an automated mining machine" and "grow cactus up to the sky".
pip install label-studio
label-studio start
Sign up and log in label studio.
There are three steps to set up your first labeling project:
- Name your project
- Import data
- Specify template for Labeling UI
Name your project and add description to it.
Import samples.json
as an example. It contains 10 data samples. The imported file should be a json file and you can follow the format of this example file if you need to customize your own dataset.
Select "Custom template" in Labeling Setup and paste template.html
into it. Click "Save" to create the project.
This screenshot is an example of the annotation UI. We use pre-annotated data (video title and description) as a suggestion for labeling. You can choose to reject the video, adjust the timestamps, select the title, or refine the description to be the task goal. Click "Submit" to record your annotation.
You can export the annotated data in any format to your local storage.
Please check out Label Studio's official website for more information!
Our paper is posted on Arxiv. If you find our work useful, please consider citing us!
@article{fan2022minedojo,
title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge},
author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar},
year = {2022},
journal = {arXiv preprint arXiv: Arxiv-2206.08853}
}