Skip to content

makeabilitylab/CookAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

Jaewook Lee1, Andrew D. Tjahjadi1, Jiho Kim1, Junpu Yu1, Minji Park2, Jiawen Zhang1,
Jon E. Froehlich1, Yapeng Tian3, Yuhang Zhao4

1University of Washington, 2Sungkyunkwan University,
3University of Texas at Dallas, 4University of Wisconsin-Madison

UIST 2024

Logo

CookAR is a Computer Vision-powered prototype AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools for people with impaired vision abilities (Low Vision). In this repo, we present the exact fine-tuned instance segmentation model for affordance augmentations, along with the first egocentric dataset of kitchen tool affordances collected and annotated by the research team.


Setup

To use CookAR, we recommand using Conda. CookAR also depends on MMDetection toolbox and PyTorch. If your GPU supports CUDA, please install it first.

conda create --name=CookAR python=3.8 
conda activate CookAR
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  //change according to your cuda version
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .

It is recommended that you first install PyTorch and then MMDetection otherwise it might not be correctly complied with CUDA.

  • Once you installed everything, firstly make three folders inside the mmdetection directory namely ./data, ./checkpoints and ./work_dir either manually or using mkdir in conda.
  • Download pre-trained config and weights files from mmdetection by running mim download mmdet --config rtmdet-ins_l_8xb32-300e_coco --dest ./checkpoints and run python test_install.py to check see if things are working correctly. You should see an image with segmentation masks pops out.

Dataset

Along with CookAR, we present the very first kitchen tool affordance image dataset, which contains 10,152 images (8,346 for training, 1193 for validation, and 596 for testing) with 18 categories of objects listed below. Raw images were extracted from EPIC-KITCHENS video dataset.

Logo

Categories

Carafe Base Carafe Handle
Cup Base Cup Handle
Fork Tines Fork Handle
Knife Blade Knife Handle
Ladle Bowl Ladle Handle
Pan Base Pan Handle
Scissor Blade Scissor Handle
Spatula Head Spatula Handle
Spoon Bowl Spoon Handle

Model fine-tuning & Run on image and video

In this section we provide a brief guideline about how to fine-tune the CookAR models on your customzied datasets and how to run on imgae or video of your choice. Specifically, we break this section into four parts:

  1. Download the checkpoints
  2. Download and check the dataset
  3. Download and edit configuration file
  4. Start training
  5. Run on image or video

CookAR is initially fine-tuned on RTMDet-Ins-L with frozen backbone stages, which can be found at the official repo. You can find a more detailed tutorial on fine-tuning RTMDet related models at here.

Step 1: Download the checkpoints

  • Vanilla CookAR: Use this link to download our fine-tuned weights.

You can directly use it for your tasks ( jump to step 3 ) or build upon it with your own data.

Step 2: Download and check the dataset

  • CookAR Dataset: Use this link to download our self-built dataset in COCO-MMDetection format.

If you are fine-tuning with your own dataset, make sure it is also in COCO-MMDetection format and it is recommanded to run coco_classcheck.py in fine-tuning folder to check the classes contained.

Step 3: Download and edit configuration file

In this repo, we also provide the config file used in our fine-tuning process, which can be found in configs folder. To use the model on your tasks directly, no modification is required and jump to step 5.

Before start your own training, check and run config_setup.py in fine-tuning folder to edit the config file. Make sure that the number of classes is correctly modified in reflect of the dataset provided and all classes are listed in the same order shown by coco_classcheck.py.

Step 4: Start training

Simply run python tools/train.py PATH/TO/CONFIG.

Step 5: Run on image or video

Use the provided scripts infer_img.py and infer_video.py to run inferences on a single image or video.

Acknowledgements

We thank Yang Li (University of Washington), Sieun Kim (Seoul National University), and XunMei Liu (University of Washington) for their assistance with this repo.

Citation

@inproceedings{10.1145/3654777.3676449,
  author = {Lee, Jaewook and Tjahjadi, Andrew D. and Kim, Jiho and Yu, Junpu and Park, Minji and Zhang, Jiawen and Froehlich, Jon E. and Tian, Yapeng and Zhao, Yuhang},
  title = {CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision},
  year = {2024},
  isbn = {9798400706288},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3654777.3676449},
  doi = {10.1145/3654777.3676449},
  abstract = {Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and developed an AR system with a stereo camera to generate visual augmentations. To validate CookAR, we conducted a technical evaluation of our fine-tuned model as well as a qualitative lab study with 10 LV participants for suitable augmentation design. Our technical evaluation demonstrates that our model outperforms the baseline on our tool affordance dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations.},
  booktitle = {Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology},
  articleno = {141},
  numpages = {16},
  keywords = {accessibility, affordance segmentation, augmented reality, visual augmentation},
  location = {Pittsburgh, PA, USA},
  series = {UIST '24}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages