Small fix painn #298

RylieWeaver · 2024-10-29T01:46:57Z

Small fix here, it would have caused errors of hanging gradients when there's only one convolutional layer.

update load_existing_model

add unscale_features_by_num_nodes_config

Add scaled by num_nodes option for feature prediction in variable graph size

* fix error name bugs * rm tracking error of node sum * reserved length for error lists

* Treat warnings as errors; Ignore deprecation warnings from tpl * fixup unscale assert

* Split device and device_name functions * Move data to device in loading rather than training (does NOT affect performance)

* frequency relaxed * frequency relaxed Co-authored-by: Massimiliano Lupo Pasini <7ml@ornl.gov>

* mapping of degree tensor to GPU * formatting fixed Co-authored-by: Massimiliano Lupo Pasini <7ml@ornl.gov>

data.batch is remapped to the same device as data.x

* added dashed red diagonal and use of empty dots in scatterplot * formatting fixed Co-authored-by: Massimiliano Lupo Pasini <7ml@ornl.gov>

* save train/val/test to pkls when total provided and reorganize data loading * remove raw in config

* Add pre-commit * Add pre-commit to dev requirements * Freeze black required version for now * Add instructions for using pre-commit Co-authored-by: Sam Reeve <6740307+streeve@users.noreply.github.com>

* fix an indexing bug in denormalization * add min/max loading from pkl

* wip: init profile * wip: add profile routine * wip: add profile routine * format fixed * Create profiler class * minor fix * Minor changes for merging * Move profile block to an upper level Co-authored-by: Massimiliano Lupo Pasini <7ml@ornl.gov> Co-authored-by: Massimiliano Lupo Pasini <massimiliano.lupo.pasini@gmail.com>

Co-authored-by: User <user@localadmins-Air.homenet.telecomitalia.it>

optimize get_head_indices with tensor operations

Remove num_nodes_list

Adding HYDRAGNN_MASTER_ADDR env to set custom DDP port.

Co-authored-by: Zhifan Ye <zhifanye@mail.ustc.edu.cn>

* Cleaner way to fail if ADIOS is not installed. ORNL#205 * formatting fix ORNL#205 --------- Co-authored-by: Kshitij V. Mehta <kshitij-v-mehta@github.com>

…NL#264) * init commit for enabling deepspeed * black formatting * optional deepspeed availability * if-else for model intialization * init commit * enable enable_deepspeed_ci ci * deepspeed required * mark pytest stages * try tests.xxx import * fixed: deepspeed_test * try test_examples only * try test_deepspeed only * clean up printing and ready to merge * disable deepspeed stage 3, maybe incompatible with CI machine * test network occupation * mark mpi for deepspeed * CI ready to deploy * fix dependency * minor format fix * double check merge * deepspeed out of optional * flush CI cache without deepspeed * remove auto CI for enable_deepspeed_ci branch * fix hash error & more elegant deepspeed-zero unit test --------- Co-authored-by: Zhifan Ye <zhifanye@mail.ustc.edu.cn>

…ing (ORNL#268) * init commit, tested work on frontier * update black formatting * amend template --------- Co-authored-by: Zhifan Ye <zye327@login07.frontier.olcf.ornl.gov>

* add energy linear regression * remove pdb * remove var_conf * fix for energy per atom * remove debug * fix energy per atom * fix for new energy calc * add npz output * save energy mean and linear regresison term * black * fix adios write * remove emean

* Update deephyper runs Update to capture all errors with try-except block. * Update gfm_deephyper_multi_perlmutter.py * Update distributed.py Use "SLURM_STEP_NODELIST" env, which is needed for HPO.

…L#277)

Adding PNAPlus Stack

* force tests, which required model arg, and some typo fixing in Lennard Jones * Add PNAPlus since it uses positions as well * formatting

* utils renamed and black formatting applied * bug fixed solved for tests * black formatting fixed * examples corrected * test_model_loadpred.py fixed * black formatting fixed * test_loss_and_activation_functions.py fixed * black formatting fixed * reverting inadvertent automated refactoring of dataset forlder into datasets * reverting inadvertent automated refactoring of dataset forlder into datasets * reverting inadvertent automated refactoring of dataset forlder into datasets * reverting inadvertent automated refactoring of dataset forlder into datasets * reverting inadvertent automated refactoring of hydragnn into hhydragnn package * reverting inadvertent automated refactoring of dataset forlder into datasets * reverting inadvertent automated refactoring of dataset forlder into datasets * reverting inadvertent automated refactoring of dataset forlder into datasets * git formatting fixed * Adagrad converted to Adamax * Additional changes to fix bugs and suggestions from erdem * imports fixed for LennardJones example * formatting fixed * imports in LJ_data.py fixed * import of graph utils fixed in LJ_data.py * import of setup.ddp() fixed in LennardJones * setup_log call fixed * get_summary_writer call fixed * additional calls fixed * black formatting fixedf

allaffa and others added 30 commits December 5, 2021 10:46

import os was missing (ORNL#55)

7fa8308

model.load_state_dict update

c8aabb5

add [xx]_scaled_num_nodes option for feature [xx]

32028d4

rebase and rename variables/functions

fec022a

Merge pull request ORNL#58 from pzhanggit/model_load

dffb15c

update load_existing_model

remove unscale_* in run prediction and train_validate_test

3585703

add unscale_features_by_num_nodes_config

Merge pull request ORNL#45 from pzhanggit/scaled_energy

bf7dfa8

Add scaled by num_nodes option for feature prediction in variable graph size

Clean up error naming and list issues (ORNL#59)

2b6b065

* fix error name bugs * rm tracking error of node sum * reserved length for error lists

Fix assertion warning (ORNL#60)

4b54527

* Treat warnings as errors; Ignore deprecation warnings from tpl * fixup unscale assert

Move data to the device once (ORNL#61)

553b966

* Split device and device_name functions * Move data to device in loading rather than training (does NOT affect performance)

add missing license header (ORNL#64)

98de5dc

frequency relaxed (ORNL#67)

4cacf22

* frequency relaxed * frequency relaxed Co-authored-by: Massimiliano Lupo Pasini <7ml@ornl.gov>

mapping of PNA degree tensor to GPU (ORNL#69)

56a581b

* mapping of degree tensor to GPU * formatting fixed Co-authored-by: Massimiliano Lupo Pasini <7ml@ornl.gov>

data.batch is remapped to the same device as data.x

cae3816

Merge pull request ORNL#71 from allaffa/remap_all_fields_of_data_to_gpus

9e8cae6

data.batch is remapped to the same device as data.x

added dashed red diagonal on scatterplot (ORNL#63)

85dadea

* added dashed red diagonal and use of empty dots in scatterplot * formatting fixed Co-authored-by: Massimiliano Lupo Pasini <7ml@ornl.gov>

Set nonzero verbosity (ORNL#78)

79612bb

save train/val/test pk files when raw total provided (ORNL#66)

de36d26

* save train/val/test to pkls when total provided and reorganize data loading * remove raw in config

Add precommit (ORNL#76)

08df317

* Add pre-commit * Add pre-commit to dev requirements * Freeze black required version for now * Add instructions for using pre-commit Co-authored-by: Sam Reeve <6740307+streeve@users.noreply.github.com>

Fixing bug in normalization (ORNL#82)

84e70f4

* fix an indexing bug in denormalization * add min/max loading from pkl

fixup: output name (ORNL#77)

35f9669

optimize train_validate_test with tensor operations

1ed9eb7

consistent device of y_loc with y

c5fc846

citation file added (ORNL#84)

89ce96b

Co-authored-by: User <user@localadmins-Air.homenet.telecomitalia.it>

Merge pull request ORNL#80 from pzhanggit/dev_train_validate_test

c651f90

optimize get_head_indices with tensor operations

fixup profiler

e2e96ff

Remove num_nodes_list

c467a73

Only profile cuda when using cuda

c7803c2

Merge pull request ORNL#86 from streeve/remove_num_nodes_list

d9b0e37

Remove num_nodes_list

jychoi-hpc and others added 28 commits July 1, 2024 15:17

Update deephyper node list util (ORNL#261)

65c4f2e

Adding HYDRAGNN_MASTER_PORT (ORNL#263)

d0b0bdc

Adding HYDRAGNN_MASTER_ADDR env to set custom DDP port.

enable preprocessing and training with ogb dataset (ORNL#262)

38b10da

Co-authored-by: Zhifan Ye <zhifanye@mail.ustc.edu.cn>

Cleaner way to fail if ADIOS is not installed. ORNL#205 (ORNL#266)

f1e2162

* Cleaner way to fail if ADIOS is not installed. ORNL#205 * formatting fix ORNL#205 --------- Co-authored-by: Kshitij V. Mehta <kshitij-v-mehta@github.com>

Add PNAPlus Model

c42e7fc

change back manual see

5e563ab

No need to change port in PR

3a4018b

Enhancement: Deepspeed Launch Template & Now Compatible with GPU Bind…

5c1e131

…ing (ORNL#268) * init commit, tested work on frontier * update black formatting * amend template --------- Co-authored-by: Zhifan Ye <zye327@login07.frontier.olcf.ornl.gov>

add GaussianNLLLoss (ORNL#270)

7e14ffb

Fix formatting with black

f8de699

fix mistaken commited files

9c62e53

fix commit to not change qm9 files

f8dcbe5

tests/inputs/ci_vectoroutput.json

5e98e6c

should not ahve used black on json file

a379306

should not ahve used black on json file

df187ea

Update deephyper runs (ORNL#274)

afb67e1

* Update deephyper runs Update to capture all errors with try-except block. * Update gfm_deephyper_multi_perlmutter.py * Update distributed.py Use "SLURM_STEP_NODELIST" env, which is needed for HPO.

random seed fixed at the beginning of each example python script (ORN…

0ca79fe

…L#277)

add CheckRemainingTime option (ORNL#279)

6a4eaa7

unnecessary imports

849e506

Merge pull request ORNL#267 from RylieWeaver/main

96f6233

Adding PNAPlus Stack

Energy forces (ORNL#278)

c3e534c

Forces test (ORNL#283)

1596a74

* force tests, which required model arg, and some typo fixing in Lennard Jones * Add PNAPlus since it uses positions as well * formatting

Merge remote-tracking branch 'upstream/main'

de9c503

small fix

5907b51

cleanup

cd4f7b5

RylieWeaver requested a review from allaffa October 29, 2024 01:46

RylieWeaver closed this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small fix painn #298

Small fix painn #298

RylieWeaver commented Oct 29, 2024

Small fix painn #298

Small fix painn #298

Conversation

RylieWeaver commented Oct 29, 2024