Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup of the pytorch GNN-LSH model #245

Merged
merged 6 commits into from
Oct 25, 2023
Merged

speedup of the pytorch GNN-LSH model #245

merged 6 commits into from
Oct 25, 2023

Conversation

jpata
Copy link
Owner

@jpata jpata commented Oct 20, 2023

Test command on one RTX2060S:

singularity exec --nv ~/HEP-KBFI/singularity/pytorch.simg python3 mlpf/pyg_pipeline.py --dataset cms --gpus 0 --data_dir ~/tensorflow_datasets/ --train --conv-type gnn-lsh --num-epochs 10 --ntrain 100 --ntest 100 --gpu-batch-multiplier 1

result before this PR:

INFO:mlpf:Rank 0: epoch=10 / 10 train_loss=42.3726 valid_loss=40.2103 stale=0 time=2.05m eta=0.0m
INFO:mlpf:Done with training. Total training time on device 0 is 20.484min

result after this PR:

INFO:mlpf:Rank 0: epoch=10 / 10 train_loss=42.5311 valid_loss=33.8545 stale=0 time=1.67m eta=0.0m
INFO:mlpf:Done with training. Total training time on device 0 is 16.876min

@jpata jpata changed the title WIP profiling work and speedup of the pytorch GNN-LSH model speedup of the pytorch GNN-LSH model Oct 23, 2023
@jpata
Copy link
Owner Author

jpata commented Oct 23, 2023

Getting some intermittent errors here in pyg-unittests like:

/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16> at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>::blend(const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&)’:
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:131:34: warning: index value is out of bound [-Warray-bounds]
  131 |       tmp_values[0] = b.values[31];
      |                                  ^

It has to do with adding torch.compile around reverse_lsh, but I couldn't reproduce it locally so far.

@jpata jpata merged commit fc2083b into main Oct 25, 2023
10 checks passed
@jpata jpata deleted the pytorch_gnnlsh_speedup branch October 25, 2023 11:56
farakiko pushed a commit to farakiko/particleflow that referenced this pull request Oct 25, 2023
farakiko pushed a commit to farakiko/particleflow that referenced this pull request Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant