Update scikit learn to 0.24 #1831

joshua-cogliati-inl · 2022-05-18T20:35:30Z

Pull Request Description

What issue does this change request address?

#1679
(pysensors requires scikit-learn 0.24 or newer)

What are the significant changes in functionality due to this change request?

Updates scikit learn to 0.24 in preparation for adding pysensors.

For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

1. Review all computer code.
2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
8. If an analytic test is changed/added is the the analytic documentation updated/added?
9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

Checking if outlier_label exists before looking at it.

Just check that all the means are the same.

dylanjm · 2022-05-18T23:31:00Z

@joshua-cogliati-inl I'm thinking the best bet would be to merge this first before fixing HERON. Since me updating the ARMA files will then break while using the devel version of RAVEN.

moosebuild · 2022-05-19T13:30:52Z

Job Test qsubs sawtooth on acf5594 : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/cluster_tests/AdaptiveSobol/test_parallel_adaptive_sobol

joshua-cogliati-inl · 2022-05-19T14:55:10Z

FYI: Github is confused. The tests have passed: https://civet.inl.gov/pr/15929/

wangcj05

I have some minor comments for your consideration.

wangcj05 · 2022-05-19T15:04:56Z

tests/framework/PostProcessors/CrossValidations/test_stratifiedKFold.xml

@@ -35,7 +35,6 @@
        <labels>0,0,0,1,1,1,1,2,2,2</labels>
        <n_splits>2</n_splits>
        <shuffle>False</shuffle>
-        <random_state>10</random_state>


why this node is removed?

Because random_state is now an error if shuffle is False (since random_state is only used when shuffle is True)

wangcj05 · 2022-05-19T15:06:20Z

tests/framework/PostProcessors/DataMiningPostProcessor/Clustering/test_dataMiningKMeans.xml

@@ -64,6 +64,7 @@
    <Print name="info">
      <type>csv</type>
      <source>clusterInfo</source>
+      <what>Output</what>


any specific reason for this modification?

Yes, there are three clusters, and which label is assigned to which is semi random, so if we only output the centers of the clusters, instead of the centers of the clusters and the label, the test doesn't randomly fail. See: tests/framework/PostProcessors/DataMiningPostProcessor/Clustering/gold/KMeans/info.csv

wangcj05 · 2022-05-19T15:08:20Z

...work/PostProcessors/DataMiningPostProcessor/DimensionalityReduction/gold/SparsePCA/dummy.csv

@@ -1,151 +1,151 @@
-x2,x3,labels,x1,x4,Output,component1,component2


Could you provide some information about the tests regolding in this pull request? A brief description about the causes for the regolding

Hm, updating to sklearn 0.24 resulted in the following diffs (with the sklearn version that the release notes that mention the model):

Diff tests/framework/ROM/SKLearn/linearElasticNetCV 0.23 Diff tests/framework/ROM/SKLearn/linearLARSCV 0.23 Diff tests/framework/ROM/SKLearn/linearLassoCV 0.23 Diff tests/framework/ROM/SKLearn/linearLassoLARSCV 0.23 Diff tests/framework/ROM/SKLearn/linearOMPCV 0.23 Diff tests/framework/ROM/TimeSeries/SyntheticHistory/ARMA ? Diff tests/framework/PostProcessors/TemporalDataMiningPostProcessor/DimensionalityReduction/SparsePCA 0.22 Diff tests/framework/PostProcessors/TSACharacterizer/Basic ? Diff tests/framework/PostProcessors/CrossValidations/stratifiedKFold 0.22 Diff tests/framework/PostProcessors/DataMiningPostProcessor/DimensionalityReduction/SparsePCA 0.22 Diff tests/framework/PostProcessors/TemporalDataMiningPostProcessor/DimensionalityReduction/MiniBatchSparsePCA 0.22

Release notes (note that all of them have a Changed Models section):

https://scikit-learn.org/stable/whats_new/v0.22.html

https://scikit-learn.org/stable/whats_new/v0.23.html

https://scikit-learn.org/stable/whats_new/v0.24.html

wangcj05 · 2022-05-19T15:09:41Z

ravenframework/SupervisedLearning/ScikitLearn/LinearModel/BayesianRidge.py

+    specs.addSub(InputData.parameterInputFactory("alpha_init", contentType=InputTypes.FloatType,
+                                                 descr=r"""Initial value for alpha (precision of the noise).
+                                                  If not set, alpha_init is $1/Var(y)$.""", default=None))
+    specs.addSub(InputData.parameterInputFactory("lambda_init", contentType=InputTypes.FloatType,
+                                                 descr=r"""Initial value for lambda (precision of the weights).""", default=1.0))


I think you need to regenerate the ROM document to reflect these changes.

moosebuild · 2022-05-19T15:34:53Z

Job Mingw Test on 7a7f0f2 : invalidated by @joshua-cogliati-inl

changed civet

joshua-cogliati-inl · 2022-05-19T16:21:20Z

Suggested email:
Scikit-learn has been updated to 0.24 with the pull request #1831
This has two effects. Due to some model changes, results can change for some RAVEN inputs that use scikit-learn models.
In addition, after updating RAVEN, establish conda will need to be run to switch to the new scikit-learn version:
./scripts/establish_conda_env.sh --install

…nimum.

wangcj05

changes are good.

moosebuild · 2022-05-19T17:43:08Z

Job Test qsubs sawtooth on 6995de8 : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/cluster_tests/AdaptiveSobol/test_parallel_adaptive_sobol

wangcj05 · 2022-05-19T18:38:03Z

Checklist is satisfied, and PR has been reviewed and approved.

Retrain ARMAs for idaholab/raven#1831

* Remove old pre-0.17 compatibility code. * Properly handling 'None' and 'most_frequent' outlier_label Checking if outlier_label exists before looking at it. * Fixing default to be float. * changing sklearn to 0.24 * Regolding and test modification for sklearn 0.24 * Doubling relative error. * Labels are somewhat random, so don't print them. Just check that all the means are the same. * Updating generated documents. * Removing check for version for StackingRegressor since 0.24 is now minimum. * Adding StackingRegressor

* idiot-proof getEntity method * splitting and cleaning XMLread * fix missing f-statement * update Simulation to allow subsequent run() calls * formatting updates * create flushOutputDataObject in DataObject and DataSet to reinitialize output DataObjects for successive workflow runs * fixing assembler issue * removing getMessageHandler from builtins * removing duplicate import statement * fixing flushOutputDataObject for when 'prefix' is not a part of the object * restart time each time a simulation is run * ensure time resets when simulation reinitialized * plots generate when RAVEN workflow rerun * fixing DeprecationWarning message * fixing printing issue for HistorySet * reset warning messages if workflow ran previously * fixing colorMap specified does not exist error * fixing 'Tried to add new data to cNDarray' for PostProcessor tests * fixing 'Tried to add new data to cNDarray' for ARMA tests * fixing h5py 'Unable to create group' issue * fix issue with controlFunction in LogicalModel * HybridModel can be re-run in workflow * fixing GeneralPlot.py for re-running OutStreams tests * fix ['prefix'] not in index error * fixing issue seen with GeneticAlgorithm * reset Steps, SolutionExport, and GradientDescent Optimizer for rerunning workflows * finish up flushing GradientDescent Optimizer * flushing GeneticAlgorithm Optimizer * finish flushing SimulatedAnnealing * flushing AdaptiveMonteCarlo and other Samplers * improving flushDataObject * flushing Sobol Sampler * flushing AdaptiveSparseGrid * fixing NetCDF issue where values were appended instead of overwritten * adding flushSampler to Stratified * updating _inputMetaVars for HistorySet * flush Databases using (mostly) existing initializeDatabase * flushing additional Samplers * fixing re-initialization of HDF5 databases for re-running workflows * simplifying Sampler inheritance * formatting touched files * flushing metadataKeys for Samplers and Optimizers * adjust spacing for Run complete! message * updating tests to check for re-running RAVEN workflow * fixing f-string issue * fix new test for re-running RAVEN workflow * vargroups in rrr dataobjects (#1823) * Farm submodule update (#1826) * update FARM submodule version * Parallel improvements (#1825) * If ray instatiated outside, use it. Basically, before, ray was only used if nodes was setup, such as with MPI mode. * No longer automatically adds mpi mode to inner ravens. * Make it a bigger test, and switch to internalParallel * Adding useful debugging information. * Wait for servers to finish starting. Otherwise they become zombies. * Only print out changed part. * Adding debugging info, and force status to disk. * Setting port to 0 so ray chooses an available port. This adds port as a parameter to the starting ray function, and sends in 0. This tells ray to choose an available port, instead of erroring if port 6379 is not available. * Find correct ray start `ray start sometimes appears, so add a space before it. * Don't start JobHandler if running remotely. If we are going to be running remotely, then we should not start job handler. Otherwise, we have to start ray or threads, only to seconds later shutdown them down. * Update scikit learn to 0.24 (#1831) * Remove old pre-0.17 compatibility code. * Properly handling 'None' and 'most_frequent' outlier_label Checking if outlier_label exists before looking at it. * Fixing default to be float. * changing sklearn to 0.24 * Regolding and test modification for sklearn 0.24 * Doubling relative error. * Labels are somewhat random, so don't print them. Just check that all the means are the same. * Updating generated documents. * Removing check for version for StackingRegressor since 0.24 is now minimum. * Adding StackingRegressor * Setup changes (#1748) * Converting to ravenframework * Fix pluginhandler. * Making pyDOE use relative imports. * Adding __init__.py so lazy is found. * pyDOE now really in contrib. * Make checking libraries optional. * Updating for raven package. * Fixing library_report. * There was both a CodeInterfaces directory and a CodeInterfaces.py This caused problems. * Support python 2.0 for utils. * Skip library check if library handler not found. * Adds ability to generate setup.cfg requirements section. Usage: python3 ./scripts/library_handler.py pip --action=setup.cfg > setup.cfg * Updating things from review. * Increasing rel_err due to test failures. (#1836) * update LOGOS submodule (#1835) * HERON Submodule Update (#1837) * reverting PythonRaven to address issues caused by recent changes * reverting PythonRaven again due to circular imports * reverting getMessageHandler * fixing formatting in DataSet.py * adding resetHybridModel method * formatting GradientDescent * restoring commented code block and fixing variable name * formatting comments * updating to common 'flush' method * adding resetSimulation functionality * formatting f-string in MultiRun Co-authored-by: Paul Talbot <paul.talbot@inl.gov> Co-authored-by: Haoyu Wang <63424217+wanghy-anl@users.noreply.github.com> Co-authored-by: Joshua J. Cogliati <joshua-cogliati-inl@users.noreply.github.com> Co-authored-by: Congjian Wang - INL <congjian.wang@inl.gov> Co-authored-by: Dylan McDowell <dylanjm@users.noreply.github.com>

joshua-cogliati-inl added 5 commits May 18, 2022 12:55

Remove old pre-0.17 compatibility code.

591331c

Properly handling 'None' and 'most_frequent' outlier_label

c6a6aa6

Checking if outlier_label exists before looking at it.

Fixing default to be float.

9fc5ad5

changing sklearn to 0.24

c0ae3cc

Regolding and test modification for sklearn 0.24

f8332fe

joshua-cogliati-inl changed the title ~~Update scikit learn~~ Update scikit learn to 0.24 May 18, 2022

joshua-cogliati-inl added 2 commits May 18, 2022 15:16

Doubling relative error.

057f5f4

Labels are somewhat random, so don't print them.

acf5594

Just check that all the means are the same.

dylanjm added a commit to dylanjm/HERON that referenced this pull request May 19, 2022

Retrain ARMAs for idaholab/raven#1831

a9ef38d

dylanjm mentioned this pull request May 19, 2022

Retrain ARMAs for idaholab/raven#1831 idaholab/HERON#165

Merged

9 tasks

joshua-cogliati-inl added the Ready To Review label May 19, 2022

wangcj05 self-requested a review May 19, 2022 14:53

wangcj05 requested changes May 19, 2022

View reviewed changes

joshua-cogliati-inl added 2 commits May 19, 2022 10:32

Updating generated documents.

0f3c9fb

Removing check for version for StackingRegressor since 0.24 is now mi…

d9991df

…nimum.

joshua-cogliati-inl force-pushed the update_scikit_learn branch from 7a7f0f2 to d9991df Compare May 19, 2022 16:35

Adding StackingRegressor

6995de8

wangcj05 approved these changes May 19, 2022

View reviewed changes

wangcj05 merged commit da6ee23 into idaholab:devel May 19, 2022

joshua-cogliati-inl deleted the update_scikit_learn branch May 19, 2022 18:49

joshua-cogliati-inl added a commit to idaholab/HERON that referenced this pull request May 24, 2022

Merge pull request #165 from dylanjm/update-arma-for-scikit

295a24a

Retrain ARMAs for idaholab/raven#1831

wangcj05 added the RAVENv2.2 for RAVENv2.2 Release label Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update scikit learn to 0.24 #1831

Update scikit learn to 0.24 #1831

joshua-cogliati-inl commented May 18, 2022 •

edited by wangcj05

Loading

dylanjm commented May 18, 2022

moosebuild commented May 19, 2022

joshua-cogliati-inl commented May 19, 2022

wangcj05 left a comment

wangcj05 May 19, 2022

joshua-cogliati-inl May 19, 2022

wangcj05 May 19, 2022

joshua-cogliati-inl May 19, 2022

wangcj05 May 19, 2022

joshua-cogliati-inl May 19, 2022

wangcj05 May 19, 2022

joshua-cogliati-inl May 19, 2022

moosebuild commented May 19, 2022

joshua-cogliati-inl commented May 19, 2022

wangcj05 left a comment

moosebuild commented May 19, 2022

wangcj05 commented May 19, 2022

		@@ -1,151 +1,151 @@
		x2,x3,labels,x1,x4,Output,component1,component2

Update scikit learn to 0.24 #1831

Update scikit learn to 0.24 #1831

Conversation

joshua-cogliati-inl commented May 18, 2022 • edited by wangcj05 Loading

Pull Request Description

What issue does this change request address?

What are the significant changes in functionality due to this change request?

For Change Control Board: Change Request Review

dylanjm commented May 18, 2022

moosebuild commented May 19, 2022

joshua-cogliati-inl commented May 19, 2022

wangcj05 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moosebuild commented May 19, 2022

joshua-cogliati-inl commented May 19, 2022

wangcj05 left a comment

Choose a reason for hiding this comment

moosebuild commented May 19, 2022

wangcj05 commented May 19, 2022

joshua-cogliati-inl commented May 18, 2022 •

edited by wangcj05

Loading