-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ML AI Plugin To Support Keras 3 #1223
Conversation
@bpaul4 to enable the ML/AI tests on versions of Python other than 3.10, you can try to add items to the job matrix FOQUS/.github/workflows/checks.yml Lines 62 to 91 in 7a04a6b
include:
- os: macos-x86_64
os-version: macos-13
- os: macos
os-version: macos-14
- os: linux
os-version: ubuntu-20.04
- os: win64
os-version: windows-2019
- python-version: '3.10' # avoid uploading coverage for full matrix
use_coverage: true
- python-version: '3.10' # this is to avoid installing optional dependencies in all environments
optional-dependencies: -r requirements-mlai.txt
+ - python-version: '3.9'
+ optional-dependencies: -r requirements-mlai.txt
+ - python-version: '3.11'
+ optional-dependencies: -r requirements-mlai.txt |
@lbianchi-lbl per the log, it appears that the "savedmodel" zip file is copied over and extracted to the test directory, but is not picked up by the |
@bpaul4 thanks for looking into this. When running locally, have you tried running the tests in a freshly created directory? e.g. mkdir testing-foqus-1223
cd testing-foqus-1223
git clone https://github.com/bpaul4/FOQUS
cd FOQUS
git switch update-keras
# activate environment
# run tests One key difference between local development and the CI environment is that the latter always starts from a "clean slate", which might be one of the factors that results in the different behavior that we're seeing. |
@lbianchi-lbl I did some more debugging, and it looks like in the fixture below
the "savedmodel" folder is picked up when Update: just saw your prior comment, I haven't tried that but can try that next. |
Great point @bpaul4. If |
@lbianchi-lbl everything is green, so I think it's ready for your review per your availability. Thank you for your suggestions! The changes within the folder examples/other_files/ML_AI_Plugin/ are mostly files being moved around. I moved old TensorFlow files into a "deprecated" folder, regenerated all of the models with the latest packages (both TensorFlow and non-TensorFlow models), and moved the current models into "supported Keras" and "other" folders. The only big change is that the example without a custom layer is now also the mea column model, as the autothermal reformer model couldn't readily be regenerated. |
@bpaul4 comparing the GHA logs for a passing test (above, Windows/Python 3.10) and failing test (below, macOS/Python 3.10), it looks like something between Do you have any idea why this is happening? What type of test is Since this seems to be happening consistently for all macOS runners (both Intel and Apple Silicon), I wonder if it's possible to someone with a macOS machine to try to reproduce this locally. |
@lbianchi-lbl thanks for investigating further. The
It can't be due to just importing one of the machine learning packages or executing the models, as the However, I see that
appears on the macOS runs but not in Windows or Linux, and this may be a more useful error to look into and see what's happening. |
Ah, great point, I didn't notice that, good catch. We're not getting more information about those failures because of the timeout, but I agree that it's worth looking into it more in detail. If you're OK, I'll push a change to the CI workflow file so that pytest exits after 3 failures so that we can see the stack traces for those tests. |
@bpaul4 it looks like the failure in the ML/AI surrogate plugin is due to a timeout being reached by pytest-qt when waiting for the successful completion of the Unfortunately, I'm not sure can get much more information from CI runs since (at least for the ML/AI surrogate) the output is redirected to the "pseudo-console" within the GUI to be accessible to the user without having to check the logs, so I think the next step would be for someone to run this locally and check visually and/or take a screenshot. Is there any macOS-using generous soul who'd be willing to check? @ksbeattie @boverhof @kbuma @franflame It should be enough to check out the code in this branch, install the ML/AI dependencies, and run the tests. I can write a complete set of commands but it'll have to wait a bit. UPDATE the following should work:
|
I was able to use the above commands to set up a clean environment and did some manual testing. At least locally, pytorch_nn and scikit_nn are both working fine with no issues; only keras_nn doesn't complete as expected. I traced through keras_nn's execution by printing to the GUI log, and it appears that model.fit() is where it gets stuck. |
@bpaul4 @lbianchi-lbl Here's what I got from running the automated tests locally:
Side note: from watching the GUI tests execute, the pytorch_nn and scikit_nn tests fail because the keras_nn test is still being run, and a new process for pytorch_nn or scikit_nn can't be started when keras_nn isn't finished. This is comparable to what I observed from manual testing, described in my previous comment. |
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1223 +/- ##
==========================================
- Coverage 38.54% 37.88% -0.66%
==========================================
Files 164 164
Lines 37032 37048 +16
Branches 6132 6153 +21
==========================================
- Hits 14274 14036 -238
- Misses 21619 21835 +216
- Partials 1139 1177 +38 ☔ View full report in Codecov by Sentry. |
Fixes/Addresses:
Updates syntax in ML AI Plugin examples, methods, and tests to support Keras 3 which replaces the standard H5 (.h5) file format with a new Keras (.keras) file format, and deprecates the legacy SavedModel (folder) model type into a inference-only TFSM loading support structure. There is still a way to load SavedModel files, but they cannot have a custom layer that is detectable by FOQUS.
Summary/Motivation:
Changes proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I agree to the following terms and conditions for my contribution: