Add CI pipeline #28

steinmig · 2024-12-17T18:27:20Z

Moves existing tests to separate folder
Convert some existing tutorials into regression tests
CI pipeline with only default installation and pytest execution for now

ajhoffman1229

I think there might be a few changes worth making before merging this branch. Namely:

reverting the use of the _ax imports (these need to be integrated with other files and removed, so we don't want any new imports from them if we can avoid it)
cleaning up tests so that they're distinct from tutorials (removing excess comments, etc.)

Some of the training can also take quite a long time to run as it exists in the tutorials right now... Do you think there we should take precautions to ensure the CI/CD pipeline doesn't take too long to run?

nff/tests/dynamics_test.py

nff/tests/test_ase.py

ajhoffman1229 · 2024-12-18T01:59:15Z

nff/tests/test_excited_states_training.py

Is this file a modified version of the tutorial for training a potential for molecules with excited states? If so, maybe we can add a file docstring at the top that describes its origin.

Yes, this file and test_training are regression test I copied from the tutorials. Adding a remark and removing some of the comments sounds good

I mean ideally one would want to simply test all of the tutorials, so we would also immediately know once they are out of date, but

I am not too familiar with notebooks in a CI context

Most of the current tutorials do not work out of the box at the moment -> also worth putting on a list

I will check with some of the other maintainers, but I believe that @xiaochendu and @recisic checked all of the notebook tutorials worked when they integrated updates to the internal version of the repo. Maybe they could weigh in? If it is still a problem, we can open an issue so that someone can address it.

Not all tutorial notebooks were functional as of PR #23 (merging repos, commit 24fe0ca), this is also mentioned in the internal repo (https://github.mit.edu/MLMat/NeuralForceField/pull/117).
I’ve already sent this via DM but am sharing it here for tracking purposes.

[✓] 01_training.ipynb [✓] 02_ase.ipynb [✓] 03_excited.ipynb [✓] 04_md.ipynb [✓] 05_transfer.ipynb [x] 06_cp3d.ipynb -- GEOM is 50GB to download, didn't try running rest of the script [✓] 07_dimenet.ipynb [x] 08_reactive_md.ipynb -- requires appending aRMSD dir to PYTHONPATH [✓] 09_painn.ipynb [✓] 11_spooky_net.ipynb [✓] 12_spooky_painn.ipynb [x] 13_tully_namd.ipynb -- taking quite a lot of time, maybe I messed up the environment [x] 14_zn_namd.ipynb -- taking quite a lot of time, maybe I messed up the environment [✓] 15_nff_md17_dataset.ipynb [✓] 16_UmbrellaSampling_and_ScansWithHarmonicConstraints.ipynb [✓] 17_eABF_and_WTMeABF.ipynb [✓] 18_fine_tuning_foundation_models.ipynb

Alright, then maybe some of the tutorials also didn't work for me due to environment issues. Once we have clearly documented requirements and a clean CI build, such issues might also get easier. What I remember: the requirement hell that I encountered was that either

our code was incompatible with the new ASE

our dependencies such as pymatgen bring in a scipy version that clashes with the old ASE

So maybe updating our code to the new ASE and then revisiting the tutorials in integrating them all into the CI is a good way forward?

nff/tests/test_training.py

steinmig · 2024-12-18T13:37:19Z

1. reverting the use of the `_ax` imports (these need to be integrated with other files and removed, so we don't want any new imports from them if we can avoid it)

see thread above

2. cleaning up tests so that they're distinct from tutorials (removing excess comments, etc.)

yes makes sense

Some of the training can also take quite a long time to run as it exists in the tutorials right now... Do you think there we should take precautions to ensure the CI/CD pipeline doesn't take too long to run?

yes, I reduced the epochs and increased the expected MAE from the tutorials to save some time. The excited states still take ~20 minutes, which is why I have disabled the test for now. The general potential training only takes a few seconds. You can see the CI execution time under Actions

ajhoffman1229 · 2025-01-22T18:47:49Z

@steinmig Sorry for just getting back to these changes now. I am not sure that we need to worry about tackling the _ax files now with this PR for the CI/CD pipeline—that can probably wait. Did you have a chance to remove some of the comments on the tests? If so, we can try merging and firing up the CI/CD right away.

You can see the CI execution time under Actions
It looks like the execution time is ~25 min... That seems a little long but not totally unreasonable right now, but I'm not super familiar with typical CI/CD execution times for code repos of this size. I remember ASE's taking about twice as long, but I don't know if they have 2x as much code.

steinmig · 2025-01-29T20:38:29Z

@steinmig Sorry for just getting back to these changes now. I am not sure that we need to worry about tackling the _ax files now with this PR for the CI/CD pipeline—that can probably wait. Did you have a chance to remove some of the comments on the tests? If so, we can try merging and firing up the CI/CD right away.

I just adapted the comments, either torch 2.6 or numpy 2 caused a new issue, but should be resolved pretty soon

It looks like the execution time is ~25 min... That seems a little long but not totally unreasonable right now, but I'm not super familiar with typical CI/CD execution times for code repos of this size. I remember ASE's taking about twice as long, but I don't know if they have 2x as much code.

The 25 min should only be with the excited states training, which is a too much to my taste for a single test spending 99% of that time just crunching numbers, it should be disabled now and only take ~3 min

ajhoffman1229

The work you did on this has been awesome, @steinmig ! Thanks so much for tackling this issue.

steinmig added 8 commits December 16, 2024 14:59

getting existent tests to work (with cpu)

607ef69

move tests to designated folder

d66dff0

some regression tests based on tutorials

ea48cb3

simple CI

4bafcbf

avoid execution of broken code by pytest?

9bdf505

remove print

05452f9

disable excited states test for now

e567529

merge ignore improvements

38bd679

steinmig requested a review from ajhoffman1229 December 17, 2024 19:11

ajhoffman1229 requested changes Dec 18, 2024

View reviewed changes

steinmig added 2 commits January 29, 2025 15:22

reduce comments in tests

4c98f4b

limit to torch 2.5

2a08208

correct range syntax

41c6c65

steinmig requested a review from ajhoffman1229 January 29, 2025 20:55

ajhoffman1229 approved these changes Jan 29, 2025

View reviewed changes

ajhoffman1229 merged commit f309fb6 into learningmatter-mit:master Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CI pipeline #28

Add CI pipeline #28

steinmig commented Dec 17, 2024

ajhoffman1229 left a comment

ajhoffman1229 Dec 18, 2024

steinmig Dec 18, 2024

steinmig Dec 18, 2024

ajhoffman1229 Jan 22, 2025

recisic Jan 22, 2025

steinmig Jan 29, 2025

steinmig commented Dec 18, 2024

ajhoffman1229 commented Jan 22, 2025

steinmig commented Jan 29, 2025

ajhoffman1229 left a comment

Add CI pipeline #28

Add CI pipeline #28

Conversation

steinmig commented Dec 17, 2024

ajhoffman1229 left a comment

Choose a reason for hiding this comment

ajhoffman1229 Dec 18, 2024

Choose a reason for hiding this comment

steinmig Dec 18, 2024

Choose a reason for hiding this comment

steinmig Dec 18, 2024

Choose a reason for hiding this comment

ajhoffman1229 Jan 22, 2025

Choose a reason for hiding this comment

recisic Jan 22, 2025

Choose a reason for hiding this comment

steinmig Jan 29, 2025

Choose a reason for hiding this comment

steinmig commented Dec 18, 2024

ajhoffman1229 commented Jan 22, 2025

steinmig commented Jan 29, 2025

ajhoffman1229 left a comment

Choose a reason for hiding this comment