Skip to content

Latest commit

 

History

History
100 lines (87 loc) · 4.59 KB

DATA.md

File metadata and controls

100 lines (87 loc) · 4.59 KB

Data download

This section will guide you on how to download the datasets, and explain about each file format exist in the datasets.

datasets

ARKitScenes includes 3 datasets,

  1. 3dod - The dataset used to train 3d object detection. The dataset includes 3 assets: low resolution RGB image, low resolution depth image and the labels (The total Size is 623.4 GB for 5047 threedod scans)
  2. upsampling - The dataset used to train depth upsampling. The dataset includes 3 assets: high resolution RGB image, low resolution depth image and high resolution depth image
  3. raw - This dataset includes all data available in ARKitScenes, the 3dod and depth upsampling datasets are a subset of it, the dataset includes much more assets that are not part of 3DOD or depth upsampling.

Downloading data

Each dataset has a CSV file that includes all the visit_id, video_id and fold available in the dataset.

3DOD CSV path:

ARKitScenes/threedod/3dod_train_val_splits.csv

Upsampling CSV path:

ARKitScenes/depth_upsampling/upsampling_train_val_splits.csv

Raw CSV path:

ARKitScenes/raw/raw_train_val_splits.csv

To download each one of the datasets, we added a python script - download_data.py.

To download a specific video_id or series of video_ids, download_data.py expect the first argument to be the dataset name (i.e. 3dod/upsampling/raw) the second argument the fold (i.e. Training/Validation) and video_id or series of video_ids.

python3 download_data.py [3dod/upsampling/raw] --split [Training/Validation] --video_id video_id1 video_id2 \
--download_dir YOUR_DATA_FOLDER

for example

python3 download_data.py raw --split Training --video_id 47333462 \
--download_dir /tmp/ARKitScenes/

or

python3 download_data.py raw --split Training --video_id 47333462 \
--download_dir /tmp/ARKitScenes/ --download_laser_scanner_point_cloud

to download the laser scanner point-clouds (available only for the raw dataset)

To download with CSV, download_data.py expect the first argument to be a dataset name (i.e. 3dod/upsampling/raw), and no need for the fold, because the fold information exist in the CSV file.

python3 download_data.py [3dod/upsampling/raw] --video_id_csv CSV_PATH \
--download_dir YOUR_DATA_FOLDER

for example

python3 download_data.py 3dod --video_id_csv threedod/3dod_train_val_splits.csv \
--download_dir /tmp/raw_ARKitScenes/

Please note that for raw data, you will need to specify the type(s) of data you would like to download. The choices are

mov annotation mesh confidence highres_depth lowres_depth lowres_wide.traj lowres_wide lowres_wide_intrinsics ultrawide 
ultrawide_intrinsics vga_wide vga_wide_intrinsics

for example

python3 download_data.py raw --video_id_csv raw/raw_train_val_splits.csv --download_dir /tmp/ar_raw_all/ \
--raw_dataset_assets mov annotation mesh confidence highres_depth lowres_depth lowres_wide.traj \
lowres_wide lowres_wide_intrinsics ultrawide ultrawide_intrinsics vga_wide vga_wide_intrinsics

The data folder (i.e. YOUR_DATA_DIR) will includes two directories, Training and Validation which includes all the assets belonging to training and validation bin respectively.

Dataset files formats

The dataset includes the following formats

  1. .png - store RGB images, depth images and confidence images
    • RGB images - regular uint8, 3 channel image
    • depth image - uint16 png format in millimeters
    • confidence - uint8 png format 0-low confidence 2-high confidence
  2. .pincam - store the intrinsic matrix for each RGB image
    • is a single-line text file, space-delimited, with the following fields: width height focal_length_x focal_length_y principal_point_x principal_point_y
  3. .json - store the object annotation
  4. .traj - is a space-delimited file where each line represents a camera position at a particular timestamp
    • Column 1: timestamp
    • Columns 2-4: rotation (axis-angle representation in radians)
    • Columns 5-7: translation (in meters)
  5. .ply - store the mesh generated by ARKit or the point-clouds generated by the Faro laser scanner
  6. .mov - video captured with ARKit (raw dataset only)
  7. _pose.txt - Transformation matrix to align/register multiple FARO scans - Lines 0-2 contain the rotation matrix and line 3 the translation vector

Dataset structure

To deep dive into the structure of each of the datasets please go to the documentation of each one of the datasets