CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

Jaewook Lee¹, Andrew D. Tjahjadi¹, Jiho Kim¹, Junpu Yu¹, Minji Park², Jiawen Zhang¹,
Jon E. Froehlich¹, Yapeng Tian³, Yuhang Zhao⁴

¹University of Washington, ²Sungkyunkwan University,
³University of Texas at Dallas, ⁴University of Wisconsin-Madison

UIST 2024

Code | Paper

CookAR is a Computer Vision-powered prototype AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools for people with impaired vision abilities (Low Vision). In this repo, we present the exact fine-tuned instance segmentation model for affordance augmentations, along with the first egocentric dataset of kitchen tool affordances collected and annotated by the research team.

Setup

To use CookAR, we recommand using Conda. CookAR also depends on MMDetection toolbox and PyTorch. If your GPU supports CUDA, please install it first.

conda create --name=CookAR python=3.8 
conda activate CookAR
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  //change according to your cuda version
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .

It is recommended that you first install PyTorch and then MMDetection otherwise it might not be correctly complied with CUDA.

Once you installed everything, firstly make three folders inside the mmdetection directory namely ./data, ./checkpoints and ./work_dir either manually or using mkdir in conda.
Download pre-trained config and weights files from mmdetection by running mim download mmdet --config rtmdet-ins_l_8xb32-300e_coco --dest ./checkpoints and run python test_install.py to check see if things are working correctly. You should see an image with segmentation masks pops out.

Dataset

Along with CookAR, we present the very first kitchen tool affordance image dataset, which contains 10,152 images (8,346 for training, 1193 for validation, and 596 for testing) with 18 categories of objects listed below. Raw images were extracted from EPIC-KITCHENS video dataset.

Model fine-tuning & Run on image and video

In this section we provide a brief guideline about how to fine-tune the CookAR models on your customzied datasets and how to run on imgae or video of your choice. Specifically, we break this section into four parts:

Download the checkpoints
Download and check the dataset
Download and edit configuration file
Start training
Run on image or video

CookAR is initially fine-tuned on RTMDet-Ins-L with frozen backbone stages, which can be found at the official repo. You can find a more detailed tutorial on fine-tuning RTMDet related models at here.

Step 1: Download the checkpoints

Vanilla CookAR: Use this link to download our fine-tuned weights.

You can directly use it for your tasks ( jump to step 3 ) or build upon it with your own data.

Step 2: Download and check the dataset

CookAR Dataset: Use this link to download our self-built dataset in COCO-MMDetection format.

If you are fine-tuning with your own dataset, make sure it is also in COCO-MMDetection format and it is recommanded to run coco_classcheck.py in fine-tuning folder to check the classes contained.

Step 3: Download and edit configuration file

In this repo, we also provide the config file used in our fine-tuning process, which can be found in configs folder. To use the model on your tasks directly, no modification is required and jump to step 5.

Before start your own training, check and run config_setup.py in fine-tuning folder to edit the config file. Make sure that the number of classes is correctly modified in reflect of the dataset provided and all classes are listed in the same order shown by coco_classcheck.py.

Step 4: Start training

Simply run python tools/train.py PATH/TO/CONFIG.

Step 5: Run on image or video

Use the provided scripts infer_img.py and infer_video.py to run inferences on a single image or video.

Acknowledgements

We thank Yang Li (University of Washington), Sieun Kim (Seoul National University), and XunMei Liu (University of Washington) for their assistance with this repo.

Citation

@inproceedings{10.1145/3654777.3676449,
  author = {Lee, Jaewook and Tjahjadi, Andrew D. and Kim, Jiho and Yu, Junpu and Park, Minji and Zhang, Jiawen and Froehlich, Jon E. and Tian, Yapeng and Zhao, Yuhang},
  title = {CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision},
  year = {2024},
  isbn = {9798400706288},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3654777.3676449},
  doi = {10.1145/3654777.3676449},
  abstract = {Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and developed an AR system with a stereo camera to generate visual augmentations. To validate CookAR, we conducted a technical evaluation of our fine-tuned model as well as a qualitative lab study with 10 LV participants for suitable augmentation design. Our technical evaluation demonstrates that our model outperforms the baseline on our tool affordance dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations.},
  booktitle = {Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology},
  articleno = {141},
  numpages = {16},
  keywords = {accessibility, affordance segmentation, augmented reality, visual augmentation},
  location = {Pittsburgh, PA, USA},
  series = {UIST '24}
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
configs		configs
finetuning		finetuning
LICENSE		LICENSE
README.md		README.md
infer_img.py		infer_img.py
infer_video.py		infer_video.py


Carafe Base	Carafe Handle
Cup Base	Cup Handle
Fork Tines	Fork Handle
Knife Blade	Knife Handle
Ladle Bowl	Ladle Handle
Pan Base	Pan Handle
Scissor Blade	Scissor Handle
Spatula Head	Spatula Handle
Spoon Bowl	Spoon Handle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

UIST 2024

Code | Paper

Setup

Dataset

Categories

Model fine-tuning & Run on image and video

Step 1: Download the checkpoints

Step 2: Download and check the dataset

Step 3: Download and edit configuration file

Step 4: Start training

Step 5: Run on image or video

Acknowledgements

Citation

About

Releases

Packages

Contributors 4

Languages

License

makeabilitylab/CookAR

Folders and files

Latest commit

History

Repository files navigation

CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

UIST 2024

Code | Paper

Setup

Dataset

Categories

Model fine-tuning & Run on image and video

Step 1: Download the checkpoints

Step 2: Download and check the dataset

Step 3: Download and edit configuration file

Step 4: Start training

Step 5: Run on image or video

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages