Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scikit learn to 0.24 #1831

Merged
merged 10 commits into from
May 19, 2022

Conversation

joshua-cogliati-inl
Copy link
Contributor

@joshua-cogliati-inl joshua-cogliati-inl commented May 18, 2022


Pull Request Description

What issue does this change request address?

#1679
(pysensors requires scikit-learn 0.24 or newer)

What are the significant changes in functionality due to this change request?

Updates scikit learn to 0.24 in preparation for adding pysensors.


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

@joshua-cogliati-inl joshua-cogliati-inl changed the title Update scikit learn Update scikit learn to 0.24 May 18, 2022
@dylanjm
Copy link
Collaborator

dylanjm commented May 18, 2022

@joshua-cogliati-inl I'm thinking the best bet would be to merge this first before fixing HERON. Since me updating the ARMA files will then break while using the devel version of RAVEN.

dylanjm added a commit to dylanjm/HERON that referenced this pull request May 19, 2022
@moosebuild
Copy link

Job Test qsubs sawtooth on acf5594 : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/cluster_tests/AdaptiveSobol/test_parallel_adaptive_sobol

@joshua-cogliati-inl
Copy link
Contributor Author

FYI: Github is confused. The tests have passed: https://civet.inl.gov/pr/15929/

Copy link
Collaborator

@wangcj05 wangcj05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments for your consideration.

@@ -35,7 +35,6 @@
<labels>0,0,0,1,1,1,1,2,2,2</labels>
<n_splits>2</n_splits>
<shuffle>False</shuffle>
<random_state>10</random_state>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this node is removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because random_state is now an error if shuffle is False (since random_state is only used when shuffle is True)

@@ -64,6 +64,7 @@
<Print name="info">
<type>csv</type>
<source>clusterInfo</source>
<what>Output</what>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason for this modification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are three clusters, and which label is assigned to which is semi random, so if we only output the centers of the clusters, instead of the centers of the clusters and the label, the test doesn't randomly fail. See: tests/framework/PostProcessors/DataMiningPostProcessor/Clustering/gold/KMeans/info.csv

@@ -1,151 +1,151 @@
x2,x3,labels,x1,x4,Output,component1,component2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide some information about the tests regolding in this pull request? A brief description about the causes for the regolding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, updating to sklearn 0.24 resulted in the following diffs (with the sklearn version that the release notes that mention the model):

Diff tests/framework/ROM/SKLearn/linearElasticNetCV 0.23
Diff tests/framework/ROM/SKLearn/linearLARSCV 0.23
Diff tests/framework/ROM/SKLearn/linearLassoCV 0.23
Diff tests/framework/ROM/SKLearn/linearLassoLARSCV 0.23
Diff tests/framework/ROM/SKLearn/linearOMPCV 0.23
Diff tests/framework/ROM/TimeSeries/SyntheticHistory/ARMA ?
Diff tests/framework/PostProcessors/TemporalDataMiningPostProcessor/DimensionalityReduction/SparsePCA  0.22
Diff tests/framework/PostProcessors/TSACharacterizer/Basic ?
Diff tests/framework/PostProcessors/CrossValidations/stratifiedKFold 0.22
Diff tests/framework/PostProcessors/DataMiningPostProcessor/DimensionalityReduction/SparsePCA 0.22
Diff tests/framework/PostProcessors/TemporalDataMiningPostProcessor/DimensionalityReduction/MiniBatchSparsePCA 0.22

Release notes (note that all of them have a Changed Models section):

  1. https://scikit-learn.org/stable/whats_new/v0.22.html
  2. https://scikit-learn.org/stable/whats_new/v0.23.html
  3. https://scikit-learn.org/stable/whats_new/v0.24.html

Comment on lines +85 to +89
specs.addSub(InputData.parameterInputFactory("alpha_init", contentType=InputTypes.FloatType,
descr=r"""Initial value for alpha (precision of the noise).
If not set, alpha_init is $1/Var(y)$.""", default=None))
specs.addSub(InputData.parameterInputFactory("lambda_init", contentType=InputTypes.FloatType,
descr=r"""Initial value for lambda (precision of the weights).""", default=1.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to regenerate the ROM document to reflect these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@moosebuild
Copy link

Job Mingw Test on 7a7f0f2 : invalidated by @joshua-cogliati-inl

changed civet

@joshua-cogliati-inl
Copy link
Contributor Author

Suggested email:
Scikit-learn has been updated to 0.24 with the pull request #1831
This has two effects. Due to some model changes, results can change for some RAVEN inputs that use scikit-learn models.
In addition, after updating RAVEN, establish conda will need to be run to switch to the new scikit-learn version:
./scripts/establish_conda_env.sh --install

Copy link
Collaborator

@wangcj05 wangcj05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are good.

@moosebuild
Copy link

Job Test qsubs sawtooth on 6995de8 : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/cluster_tests/AdaptiveSobol/test_parallel_adaptive_sobol

@wangcj05
Copy link
Collaborator

Checklist is satisfied, and PR has been reviewed and approved.

@wangcj05 wangcj05 merged commit da6ee23 into idaholab:devel May 19, 2022
@joshua-cogliati-inl joshua-cogliati-inl deleted the update_scikit_learn branch May 19, 2022 18:49
joshua-cogliati-inl added a commit to idaholab/HERON that referenced this pull request May 24, 2022
dgarrett622 pushed a commit to dgarrett622/raven that referenced this pull request Jun 1, 2022
* Remove old pre-0.17 compatibility code.

* Properly handling 'None' and 'most_frequent' outlier_label

Checking if outlier_label exists before looking at it.

* Fixing default to be float.

* changing sklearn to 0.24

* Regolding and test modification for sklearn 0.24

* Doubling relative error.

* Labels are somewhat random, so don't print them.

Just check that all the means are the same.

* Updating generated documents.

* Removing check for version for StackingRegressor since 0.24 is now minimum.

* Adding StackingRegressor
PaulTalbot-INL added a commit that referenced this pull request Jun 15, 2022
* idiot-proof getEntity method

* splitting and cleaning XMLread

* fix missing f-statement

* update Simulation to allow subsequent run() calls

* formatting updates

* create flushOutputDataObject in DataObject and DataSet to reinitialize output DataObjects for successive workflow runs

* fixing assembler issue

* removing getMessageHandler from builtins

* removing duplicate import statement

* fixing flushOutputDataObject for when 'prefix' is not a part of the object

* restart time each time a simulation is run

* ensure time resets when simulation reinitialized

* plots generate when RAVEN workflow rerun

* fixing DeprecationWarning message

* fixing printing issue for HistorySet

* reset warning messages if workflow ran previously

* fixing colorMap specified does not exist error

* fixing 'Tried to add new data to cNDarray' for PostProcessor tests

* fixing 'Tried to add new data to cNDarray' for ARMA tests

* fixing h5py 'Unable to create group' issue

* fix issue with controlFunction in LogicalModel

* HybridModel can be re-run in workflow

* fixing GeneralPlot.py for re-running OutStreams tests

* fix ['prefix'] not in index error

* fixing issue seen with GeneticAlgorithm

* reset Steps, SolutionExport, and GradientDescent Optimizer for rerunning workflows

* finish up flushing GradientDescent Optimizer

* flushing GeneticAlgorithm Optimizer

* finish flushing SimulatedAnnealing

* flushing AdaptiveMonteCarlo and other Samplers

* improving flushDataObject

* flushing Sobol Sampler

* flushing AdaptiveSparseGrid

* fixing NetCDF issue where values were appended instead of overwritten

* adding flushSampler to Stratified

* updating _inputMetaVars for HistorySet

* flush Databases using (mostly) existing initializeDatabase

* flushing additional Samplers

* fixing re-initialization of HDF5 databases for re-running workflows

* simplifying Sampler inheritance

* formatting touched files

* flushing metadataKeys for Samplers and Optimizers

* adjust spacing for Run complete! message

* updating tests to check for re-running RAVEN workflow

* fixing f-string issue

* fix new test for re-running RAVEN workflow

* vargroups in rrr dataobjects (#1823)

* Farm submodule update (#1826)

* update FARM submodule version

* Parallel improvements (#1825)

* If ray instatiated outside, use it.

Basically, before, ray was only used if nodes was setup,
such as with MPI mode.

* No longer automatically adds mpi mode to inner ravens.

* Make it a bigger test, and switch to internalParallel

* Adding useful debugging information.

* Wait for servers to finish starting.

Otherwise they become zombies.

* Only print out changed part.

* Adding debugging info, and force status to disk.

* Setting port to 0 so ray chooses an available port.

This adds port as a parameter to the starting ray function,
and sends in 0.  This tells ray to choose an available port,
instead of erroring if port 6379 is not available.

* Find correct ray start

`ray start sometimes appears, so add a space before it.

* Don't start JobHandler if running remotely.

If we are going to be running remotely, then we should
not start job handler. Otherwise, we have to start ray or threads, only
to seconds later shutdown them down.

* Update scikit learn to 0.24 (#1831)

* Remove old pre-0.17 compatibility code.

* Properly handling 'None' and 'most_frequent' outlier_label

Checking if outlier_label exists before looking at it.

* Fixing default to be float.

* changing sklearn to 0.24

* Regolding and test modification for sklearn 0.24

* Doubling relative error.

* Labels are somewhat random, so don't print them.

Just check that all the means are the same.

* Updating generated documents.

* Removing check for version for StackingRegressor since 0.24 is now minimum.

* Adding StackingRegressor

* Setup changes (#1748)

* Converting to ravenframework

* Fix pluginhandler.

* Making pyDOE use relative imports.

* Adding __init__.py so lazy is found.

* pyDOE now really in contrib.

* Make checking libraries optional.

* Updating for raven package.

* Fixing library_report.

* There was both a CodeInterfaces directory and a CodeInterfaces.py This caused problems.

* Support python 2.0 for utils.

* Skip library check if library handler not found.

* Adds ability to generate setup.cfg requirements section.

Usage:
python3 ./scripts/library_handler.py pip --action=setup.cfg > setup.cfg

* Updating things from review.

* Increasing rel_err due to test failures. (#1836)

* update LOGOS submodule (#1835)

* HERON Submodule Update (#1837)

* reverting PythonRaven to address issues caused by recent changes

* reverting PythonRaven again due to circular imports

* reverting getMessageHandler

* fixing formatting in DataSet.py

* adding resetHybridModel method

* formatting GradientDescent

* restoring commented code block and fixing variable name

* formatting comments

* updating to common 'flush' method

* adding resetSimulation functionality

* formatting f-string in MultiRun

Co-authored-by: Paul Talbot <paul.talbot@inl.gov>
Co-authored-by: Haoyu Wang <63424217+wanghy-anl@users.noreply.github.com>
Co-authored-by: Joshua J. Cogliati <joshua-cogliati-inl@users.noreply.github.com>
Co-authored-by: Congjian Wang - INL <congjian.wang@inl.gov>
Co-authored-by: Dylan McDowell <dylanjm@users.noreply.github.com>
@wangcj05 wangcj05 added the RAVENv2.2 for RAVENv2.2 Release label Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RAVENv2.2 for RAVENv2.2 Release Ready To Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants