speedup of the pytorch GNN-LSH model #245

jpata · 2023-10-20T12:52:46Z

Test command on one RTX2060S:

singularity exec --nv ~/HEP-KBFI/singularity/pytorch.simg python3 mlpf/pyg_pipeline.py --dataset cms --gpus 0 --data_dir ~/tensorflow_datasets/ --train --conv-type gnn-lsh --num-epochs 10 --ntrain 100 --ntest 100 --gpu-batch-multiplier 1

result before this PR:

INFO:mlpf:Rank 0: epoch=10 / 10 train_loss=42.3726 valid_loss=40.2103 stale=0 time=2.05m eta=0.0m
INFO:mlpf:Done with training. Total training time on device 0 is 20.484min

result after this PR:

INFO:mlpf:Rank 0: epoch=10 / 10 train_loss=42.5311 valid_loss=33.8545 stale=0 time=1.67m eta=0.0m
INFO:mlpf:Done with training. Total training time on device 0 is 16.876min

jpata · 2023-10-23T11:02:00Z

Getting some intermittent errors here in pyg-unittests like:

/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16> at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>::blend(const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&)’:
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:131:34: warning: index value is out of bound [-Warray-bounds]
  131 |       tmp_values[0] = b.values[31];
      |                                  ^

It has to do with adding torch.compile around reverse_lsh, but I couldn't reproduce it locally so far.

* Merge * assert

jpata added 5 commits October 20, 2023 15:28

Merge

85a3e48

up

a2d3c1d

Merge remote-tracking branch 'origin/main' into pytorch_gnnlsh_speedup

f4a6a0b

format

d247a09

up

cb52be1

jpata changed the title ~~WIP profiling work and speedup of the pytorch GNN-LSH model~~ speedup of the pytorch GNN-LSH model Oct 23, 2023

assert

53e1dc7

jpata merged commit fc2083b into main Oct 25, 2023
10 checks passed

jpata deleted the pytorch_gnnlsh_speedup branch October 25, 2023 11:56

farakiko pushed a commit to farakiko/particleflow that referenced this pull request Oct 25, 2023

speedup of the pytorch GNN-LSH model (jpata#245)

a3e3ef8

* Merge * assert

farakiko pushed a commit to farakiko/particleflow that referenced this pull request Jan 23, 2024

speedup of the pytorch GNN-LSH model (jpata#245)

80c0c86

* Merge * assert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speedup of the pytorch GNN-LSH model #245

speedup of the pytorch GNN-LSH model #245

jpata commented Oct 20, 2023 •

edited

Loading

jpata commented Oct 23, 2023

speedup of the pytorch GNN-LSH model #245

speedup of the pytorch GNN-LSH model #245

Conversation

jpata commented Oct 20, 2023 • edited Loading

jpata commented Oct 23, 2023

jpata commented Oct 20, 2023 •

edited

Loading