Citation

Pushing the Limits of Gradient Descent for Efficient Learning on Large Images

Deepak Gupta, Gowreesh Mago, Arnav Chavan, Dilip Prasad, Rajat Mani Thomas

🔥Accepted in TMLR (08/2024) OpenReview

Abstract

Traditional deep learning models are trained and tested on relatively low-resolution images ($<300$ px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows us to train the existing CNN and transformer architectures (hereby referred to as deep learning models) on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image at a time, ensuring that the majority of it is covered over the course of iterations. PatchGD thus extensively enjoys better memory and compute efficiency when training models on large-scale images. PatchGD is thoroughly evaluated on PANDA, UltraMNIST, TCGA, and ImageNet datasets with ResNet50, MobileNetV2, ConvNeXtV2, and DeiT models under different memory constraints. Our evaluation clearly shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images, especially when the compute memory is limited.

Code usage details

Setup the environment

Create a conda envirnoment:

conda create -n pgd python=3.12
conda activate pgd

Install requirements using the following command:

pip install -r requirements.txt

Data

Experiments mentioned in the paper use the following datasets:

PANDA and UltraMNIST dataset processing scripts are included in the utility_codes directory where folds for PANDA and full dataset for UltraMNIST can be generated.

For ImageNet and TCGA(LUAD and LUSC), the dataset can be downloaded from Kaggle (for ImageNet) and (LUAD & LUSC with setup instructions listed here) the splits can be made from the dataset.

File structure

baselines directory contains the code to run baseline experiments mentioned in the paper for PANDA and UltraMNIST
HAR_1d_example directory contains code to run experiments on the Human Activity Recognition dataset (1-d generalization of PatchGD)
patch_gd directory contains the code to run experiments using PatchGD algorithm for PANDA and UltraMNIST
utility_codes directory contains utitlity codes for PANDA and UltraMNIST including creating dataset and folds, calculation of stats, running multiple experiments on multiple gpus etc.

Citation

Please cite using the following citation:

@article{gupta2023patch,
  title={Patch gradient descent: Training neural networks on very large images},
  author={Gupta, Deepak K and Mago, Gowreesh and Chavan, Arnav and Prasad, Dilip K},
  journal={arXiv preprint arXiv:2301.13817},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pushing the Limits of Gradient Descent for Efficient Learning on Large Images

🔥Accepted in TMLR (08/2024) OpenReview

Abstract

Code usage details

Setup the environment

Data

File structure

Citation

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
HAR_1d_example		HAR_1d_example
baselines		baselines
patch_gd		patch_gd
utility_codes		utility_codes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

nyunAI/PatchGD

Folders and files

Latest commit

History

Repository files navigation

Pushing the Limits of Gradient Descent for Efficient Learning on Large Images

🔥Accepted in TMLR (08/2024) OpenReview

Abstract

Code usage details

Setup the environment

Data

File structure

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages