Pull request for CQML extension (2nd trial) #115

zaspel · 2019-07-29T13:28:49Z

This pull request aims at adding the multi-fidelity learning approach called CQML, which is documented in the paper "Boosting quantum machine learning models with multi-level combination technique: Pople diagrams revisited":

the core part of the new code are the new files qml/models/cqml2d.py
and qml/models/cqml.py, which contain a 2d CQML implementation
and a 2d/3d/nd CQML implementation. The 2d implementation follows the
simplified idea presented in Section 3.4 in the paper, while the 2d/3d/nd
CQML implementation is documented in Section 3.5

examples/cqml2d_CI9.py is the code, which was used to create
2d CQML results on the CI9 data set in the paper

examples/cqml_QM7b.py, examples/cqml_in_2d_*.py are the codes
that were used to create the 2d/3d results for the QM7b data set
in the paper

the CI9 and the QM7b data set are included in the examples directory

* Added ARAS->FCHL code * updated fchl kernels code * Cleaned up FCHL code, parallelized weight ksi functions * Factor 4 speed in three-body term * Updated alchemy and speed * Added global FCHL kernel * Fixed initialization bug in FCHL global kernel * Fixed clearing of self-dotprodicts, and removed excessive OMP memory use * More parallelization issues fixed in FCHL * Fixed 3-body parallelization, added atomic kernels, added option for no alchemy * Fixed parallelization memory, added force kernels to FCHL * Added two- and three-body exponenets as parameters * Added alchemy module, added custom alchemy vectors. * Updated parallelization etc. * Fchl module (qmlcode#34) * Updated to module and added screening function * Fixed bug in cut-off function. * Removed debug output from cut-off function. * Added FCHL to develop branch

…utchison

* option to calculate only subset of atoms in local representations (qmlcode#26) * option to calculate only subset of atoms in local representations * added reference calculation of representations * finalized reference testing * fixed error in test * Atomic kernels memory (qmlcode#27) * Added local kernels and tests for local kernels * Slatm (qmlcode#28) * Updated SLATM, removed ASE, updated docs, updated SLATM tests case * Added F90 version of get_sbot in SLATM * Added F2py implementation of get_sbop in slatm * Fixed remaining issues with f90-slatm, works for global and local now. * Added testcases for local SLATM * Removed dead code from slatm.py * Linear kernel (qmlcode#29) * Added linear kernel and test * Updated documentation for kernels and removed compiler warning (from frepresentations) * Added global ARAD kernel and test * Linalg (qmlcode#30) * Added MKL discovery * Updated cho_solve and cho_invert, so input is conserved * Added busch-kaufman factorization solver and inversion (DSYTRS) * Linear kernel (qmlcode#31) * Added linear kernel and test * Updated documentation for kernels and removed compiler warning (from frepresentations) * Added global ARAD and linear global kernel * Corrected ARAD global kernel to L2 distance * Bob bug (qmlcode#33) * Fixed ordering in Bob, added integration test * Tightened threshold in Bob integration test to 2.8 kcal/mol * Fchl main (qmlcode#37) * Added ARAS->FCHL code * updated fchl kernels code * Cleaned up FCHL code, parallelized weight ksi functions * Factor 4 speed in three-body term * Updated alchemy and speed * Added global FCHL kernel * Fixed initialization bug in FCHL global kernel * Fixed clearing of self-dotprodicts, and removed excessive OMP memory use * More parallelization issues fixed in FCHL * Fixed 3-body parallelization, added atomic kernels, added option for no alchemy * Fixed parallelization memory, added force kernels to FCHL * Added two- and three-body exponenets as parameters * Added alchemy module, added custom alchemy vectors. * Updated parallelization etc. * Fchl module (qmlcode#34) * Updated to module and added screening function * Fixed bug in cut-off function. * Removed debug output from cut-off function. * Added FCHL to develop branch * Updated clang->gcc in macos installation instructions, hattip geoff hutchison * Updated autodeployment to GH pages and PyPI

* option to calculate only subset of atoms in local representations (qmlcode#26) * option to calculate only subset of atoms in local representations * added reference calculation of representations * finalized reference testing * fixed error in test * Atomic kernels memory (qmlcode#27) * Added local kernels and tests for local kernels * Slatm (qmlcode#28) * Updated SLATM, removed ASE, updated docs, updated SLATM tests case * Added F90 version of get_sbot in SLATM * Added F2py implementation of get_sbop in slatm * Fixed remaining issues with f90-slatm, works for global and local now. * Added testcases for local SLATM * Removed dead code from slatm.py * Linear kernel (qmlcode#29) * Added linear kernel and test * Updated documentation for kernels and removed compiler warning (from frepresentations) * Added global ARAD kernel and test * Linalg (qmlcode#30) * Added MKL discovery * Updated cho_solve and cho_invert, so input is conserved * Added busch-kaufman factorization solver and inversion (DSYTRS) * Linear kernel (qmlcode#31) * Added linear kernel and test * Updated documentation for kernels and removed compiler warning (from frepresentations) * Added global ARAD and linear global kernel * Corrected ARAD global kernel to L2 distance * Bob bug (qmlcode#33) * Fixed ordering in Bob, added integration test * Tightened threshold in Bob integration test to 2.8 kcal/mol * Fchl main (qmlcode#37) * Added ARAS->FCHL code * updated fchl kernels code * Cleaned up FCHL code, parallelized weight ksi functions * Factor 4 speed in three-body term * Updated alchemy and speed * Added global FCHL kernel * Fixed initialization bug in FCHL global kernel * Fixed clearing of self-dotprodicts, and removed excessive OMP memory use * More parallelization issues fixed in FCHL * Fixed 3-body parallelization, added atomic kernels, added option for no alchemy * Fixed parallelization memory, added force kernels to FCHL * Added two- and three-body exponenets as parameters * Added alchemy module, added custom alchemy vectors. * Updated parallelization etc. * Fchl module (qmlcode#34) * Updated to module and added screening function * Fixed bug in cut-off function. * Removed debug output from cut-off function. * Added FCHL to develop branch * Updated clang->gcc in macos installation instructions, hattip geoff hutchison * Updated autodeployment to GH pages and PyPI * Updated version number

* Updated FCHL documentation

* Updated FCHL documentation * Fixed arad-> fchl in docs * removed deprecated arad test from test_wrappers * Updated travis yml to publish docs properly * Updated increment version

* Updated travis yml to publish docs properly

* Fixed atomtype parsing bug

larsbratholm fixed bob bug, updated FORTRAN implementations

* Reverts to old directory structure * Fix authors in several __init__ files that are updated

* Commit of all 'secret' FCHL operator code * Also adding the actual FCHL operator files * Updated init for alchemy.py * Updated import of alchemy module * Removed old file * Updated manual for FCHL functionality - still needs tutorial/examples pages

* Added symmetric kernels * Moved driver away from legacy ml directory

* Made base representations * started CM and data class * Working on generate routine * Working basic example * Mostly hacked the searchcv routines to work * Implementing atomic gaussian kernel * working atomic krr * Restructure and started global slatm * Slatm * Started acsf * stash before merging acsf bugfix * acsf bugfix cherrypick * sigma='auto' option added to kernels * Started fchl * Working fchl * Started preprocessing * Mostly working atom scaler * Made several attributes private * Restructured how the data object is passed, to avoid possible memory issues * Started alchemy in kernels * Minor change to kernel alchemy * Working feature trick in kernels * Cleaned up code * daily * Finished examples

* Corrected small bug in predict function * Started updating so that model can be trained after its been reloaded * Minor modifications * Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions * Fix for the problem of shuffling * Added some tests to make sure the predictions work * Fixed a tensorboard problem * The saving of the model doesn't cause an error if the directory already exists * Fixed a bug that made a test fail * Modified the name of a parameter * Made modifications to make te symmetry functions more numerically stable * Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set. * corrected bug in score function for padded molecules * Changes that make the model work quickly even when there is padding. * Fixed discrepancies between fortran and TF acsf * Corrected bug in setting of ACSF parameters * Attempt at fixing issue qmlcode#10 * another attempt at fixing qmlcode#10 * Removed a pointless line * set-up * Added the graceful killer * Modifications which prevent installation from breaking on BC4 * Modification to add neural networks to qmlearn * Fix for issue qmlcode#8 * Random comment * Started including the atomic model * Made the atomic neural network work * Fixed a bug with the indices * Now training and predictions don't use the default graph, to avoid problems * uncommented examples * Removed unique_elements in data class This can be stored in the NN class, but I might reverse the change later * Made tensorflow an optional dependency The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one. * Made is_numeric non-private and removed legacy code * Added 1d array util function * Removed QML check and moved functions from utils to tf_utils * Support for linear models (no hidden layers) * fixed import bug in tf_utils * Added text to explain that you are scoring on training set * Restructure. But elements are still not working Sorted elements * Moved documentation from init to class * Constant features will now be removed at fit/predict time * Moved get_batch_size back into utils, since it doesn't depend on tf * Made the NeuralNetwork class compliant with sklearn Cannot be any transforms of the input data * Fixed tests that didn't pass * Fixed mistake in checks of set_classes() in ARMP * started fixing ARMP bugs for QM7 * Fixed bug in padding and added examples that give low errors * Attempted fix to make representations single precision * Hot fix for AtomScaler * Minor bug fixes * More bug fixes to make sure tests run * Fixed some tests that had failures * Reverted the fchl tests to original * Fixed path in acsf test * Readded changes to tests * Modifications after code review

which is documented in the paper "Boosting quantum machine learning models with multi-level combination technique: Pople diagrams revisited" * the core part of the new code are the new files qml/models/cqml2d.py and qml/models/cqml.py, which contain a 2d CQML implementation and a 2d/3d/nd CQML implementation. The 2d implementation follows the simplified idea presented in Section 3.4 in the paper, while the 2d/3d/nd CQML implementation is documented in Section 3.5 * examples/cqml2d_CI9.py is the code, which was used to create 2d CQML results on the CI9 data set in the paper * examples/cqml_QM7b.py, examples/cqml_in_2d_*.py are the codes that were used to create the 2d/3d results for the QM7b data set in the paper * the CI9 and the QM7b data set are included in the examples directory

…qmlcode#96)

* Minor formatting changes. Removed omp from a loop that didn't seem safe. * Fixed omp issue that resulted in inconsistent sorting * Revert changes to test. Local builds pass but travis fail, but I assume that the changes made should be more robust

This was done to help fitting of underdetermined cases, such as when the chemical composition never or rarely changes (md snapshots etc). Also a slight restructure was done to avoid repeated code.

* Fixed bug in gaussian symmetric vector kernels, where I had called the non-symmetric kernel by accident * Removed legacy code. Closes qmlcode#86 * Fixed a minor bug/feature in examples * Removed call to non-existing function in fchl electric field test * Fixed an error in how fchl kernels was called. Updated kernels to reuse more code from other submodules * Removed print used for debugging * Fixed minor bug in FCHL when sigma='auto' * Changed scaling/power to be positive instead of negative

* Corrected small bug in predict function * Started updating so that model can be trained after its been reloaded * Minor modifications * Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions * Fix for the problem of shuffling * Added some tests to make sure the predictions work * Fixed a tensorboard problem * The saving of the model doesn't cause an error if the directory already exists * Fixed a bug that made a test fail * Modified the name of a parameter * Made modifications to make te symmetry functions more numerically stable * Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set. * corrected bug in score function for padded molecules * Changes that make the model work quickly even when there is padding. * Fixed discrepancies between fortran and TF acsf * Corrected bug in setting of ACSF parameters * Attempt at fixing issue qmlcode#10 * another attempt at fixing qmlcode#10 * Removed a pointless line * set-up * Added the graceful killer * Modifications which prevent installation from breaking on BC4 * Modification to add neural networks to qmlearn * Fix for issue qmlcode#8 * Random comment * Started including the atomic model * Made the atomic neural network work * Fixed a bug with the indices * Now training and predictions don't use the default graph, to avoid problems * uncommented examples * Removed unique_elements in data class This can be stored in the NN class, but I might reverse the change later * Made tensorflow an optional dependency The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one. * Made is_numeric non-private and removed legacy code * Added 1d array util function * Removed QML check and moved functions from utils to tf_utils * Support for linear models (no hidden layers) * fixed import bug in tf_utils * Added text to explain that you are scoring on training set * Restructure. But elements are still not working Sorted elements * Moved documentation from init to class * Constant features will now be removed at fit/predict time * Moved get_batch_size back into utils, since it doesn't depend on tf * Made the NeuralNetwork class compliant with sklearn Cannot be any transforms of the input data * Fixed tests that didn't pass * Fixed mistake in checks of set_classes() in ARMP * started fixing ARMP bugs for QM7 * Fixed bug in padding and added examples that give low errors * Attempted fix to make representations single precision * Hot fix for AtomScaler * Minor bug fixes * More bug fixes to make sure tests run * Fixed some tests that had failures * Reverted the fchl tests to original * Fixed path in acsf test * Readded changes to tests * Modifications after code review * Version with the ACSF basis functions starting at 0.8 A * Updated ACSF representations so that the minimum distance at which to start the binning can be set by the user * Modified the name of the new parameter (minimum distance of the binning in ACSF)

Fixed error and warnings caused by latest numpy 1.16.0 and character error

* Added Kernel PCA and test for KPCA

* Made Compound able to read file-like objects I added a six dependency to make life easier * Give Compound a name only if there is a filename

* Corrected small bug in predict function * Started updating so that model can be trained after its been reloaded * Minor modifications * Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions * Fix for the problem of shuffling * Added some tests to make sure the predictions work * Fixed a tensorboard problem * The saving of the model doesn't cause an error if the directory already exists * Fixed a bug that made a test fail * Modified the name of a parameter * Made modifications to make te symmetry functions more numerically stable * Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set. * corrected bug in score function for padded molecules * Changes that make the model work quickly even when there is padding. * Fixed discrepancies between fortran and TF acsf * Corrected bug in setting of ACSF parameters * Attempt at fixing issue qmlcode#10 * another attempt at fixing qmlcode#10 * Removed a pointless line * set-up * Added the graceful killer * Modifications which prevent installation from breaking on BC4 * Modification to add neural networks to qmlearn * Fix for issue qmlcode#8 * Random comment * Started including the atomic model * Made the atomic neural network work * Fixed a bug with the indices * Now training and predictions don't use the default graph, to avoid problems * uncommented examples * Removed unique_elements in data class This can be stored in the NN class, but I might reverse the change later * Made tensorflow an optional dependency The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one. * Made is_numeric non-private and removed legacy code * Added 1d array util function * Removed QML check and moved functions from utils to tf_utils * Support for linear models (no hidden layers) * fixed import bug in tf_utils * Added text to explain that you are scoring on training set * Restructure. But elements are still not working Sorted elements * Moved documentation from init to class * Constant features will now be removed at fit/predict time * Moved get_batch_size back into utils, since it doesn't depend on tf * Made the NeuralNetwork class compliant with sklearn Cannot be any transforms of the input data * Fixed tests that didn't pass * Fixed mistake in checks of set_classes() in ARMP * started fixing ARMP bugs for QM7 * Fixed bug in padding and added examples that give low errors * Attempted fix to make representations single precision * Hot fix for AtomScaler * Minor bug fixes * More bug fixes to make sure tests run * Fixed some tests that had failures * Reverted the fchl tests to original * Fixed path in acsf test * Readded changes to tests * Modifications after code review * Version with the ACSF basis functions starting at 0.8 A * Updated ACSF representations so that the minimum distance at which to start the binning can be set by the user * Modified the name of the new parameter (minimum distance of the binning in ACSF) * Added a function to the atomscaler that enables to revert back * Relaxed tolerance in tests

* removed old dataprovider and ASE requirements * Updated test for relocated compond

* Corrected small bug in predict function * Started updating so that model can be trained after its been reloaded * Minor modifications * Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions * Fix for the problem of shuffling * Added some tests to make sure the predictions work * Fixed a tensorboard problem * The saving of the model doesn't cause an error if the directory already exists * Fixed a bug that made a test fail * Modified the name of a parameter * Made modifications to make te symmetry functions more numerically stable * Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set. * corrected bug in score function for padded molecules * Changes that make the model work quickly even when there is padding. * Fixed discrepancies between fortran and TF acsf * Corrected bug in setting of ACSF parameters * Attempt at fixing issue qmlcode#10 * another attempt at fixing qmlcode#10 * Removed a pointless line * set-up * Added the graceful killer * Modifications which prevent installation from breaking on BC4 * Modification to add neural networks to qmlearn * Fix for issue qmlcode#8 * Random comment * Started including the atomic model * Made the atomic neural network work * Fixed a bug with the indices * Now training and predictions don't use the default graph, to avoid problems * uncommented examples * Removed unique_elements in data class This can be stored in the NN class, but I might reverse the change later * Made tensorflow an optional dependency The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one. * Made is_numeric non-private and removed legacy code * Added 1d array util function * Removed QML check and moved functions from utils to tf_utils * Support for linear models (no hidden layers) * fixed import bug in tf_utils * Added text to explain that you are scoring on training set * Restructure. But elements are still not working Sorted elements * Moved documentation from init to class * Constant features will now be removed at fit/predict time * Moved get_batch_size back into utils, since it doesn't depend on tf * Made the NeuralNetwork class compliant with sklearn Cannot be any transforms of the input data * Fixed tests that didn't pass * Fixed mistake in checks of set_classes() in ARMP * started fixing ARMP bugs for QM7 * Fixed bug in padding and added examples that give low errors * Attempted fix to make representations single precision * Hot fix for AtomScaler * Minor bug fixes * More bug fixes to make sure tests run * Fixed some tests that had failures * Reverted the fchl tests to original * Fixed path in acsf test * Readded changes to tests * Modifications after code review * Version with the ACSF basis functions starting at 0.8 A * Updated ACSF representations so that the minimum distance at which to start the binning can be set by the user * Modified the name of the new parameter (minimum distance of the binning in ACSF) * Added a function to the atomscaler that enables to revert back * Relaxed tolerance in tests * Fixed bug in the padding of the representation in the ARMP network used in the pipeline * Made a modification to how the Fortran ACSF are generated that helps with how much memory is used. Currrently only float32 ACSF are available * Added a check to make sure there are no NANs in the representations. * Small mistake corrected in aglaia * Fixed extra space before -lpthread flag * Removed what I added * Implemented MRMP representations from xyz * Generate atomic slatm from data * Fixed typo * Fixed problem with slatm and ARMP * Fixed bug for MRMP tensorboard logger * Actually fixed the tensorboard bug for MRMP and added tests to catch future errors * Fixed another tensorboard bug * Changed the behaviour of logging to tensorboard in MRMP

cqml development branch Merge remote-tracking branch 'upstream/develop' into dev_cqml

development version of qml-code

(reflecting final version of teh CQML paper)

documents, where to find the files.

Merged latest version of official develop branch into CQML branch of the CQML developement fork.

andersx and others added 30 commits March 1, 2018 15:40

Updated clang->gcc in macos installation instructions, hattip geoff h…

b74b173

…utchison

Updated autodeployment to GH pages and PyPI

4f8e5ad

Updated .travis.yml again

a9aa729

Updated .travis.yml again2

3bbaa45

Updated .travis.yml again3

34f1bf7

Updated version number

15e45d7

Merge branch 'master' into develop

e225620

Fchl doc (qmlcode#41)

9e99168

* Updated FCHL documentation

Fchl doc (qmlcode#42)

caeaa6d

* Updated FCHL documentation * Fixed arad-> fchl in docs * removed deprecated arad test from test_wrappers * Updated travis yml to publish docs properly * Updated increment version

Fchl doc (qmlcode#44)

4f0c6c0

* Updated travis yml to publish docs properly

Fchl doc (qmlcode#45)

d817ca5

* Fixed atomtype parsing bug

pulled master into develop

b48ce5b

added custom alchemy to FCHL and fixed parsing bug in compound class

e15587b

bobfix (qmlcode#46)

c0298d8

larsbratholm fixed bob bug, updated FORTRAN implementations

preparing merge of tests

4799500

Merged tests

6c7d5b8

updated docs

adb6b11

added examples

bb764cb

Preparing for qml high level interface merge

bc3acbe

high level interface merge

45de1bd

Fixed test that failed with py2

0910e6c

Another py2 test fix

a3899cd

Added dependencies in travis

d007b8e

Added dependencies in travis

b0f5d40

Merge remote-tracking branch 'origin/qmldev' into develop

49b475b

Added dependencies in travis

ffaa6d3

Added dependencies in travis

d400fdf

andersx and others added 30 commits August 15, 2018 11:31

Directory structure (qmlcode#71)

cf4b9bc

* Reverts to old directory structure * Fix authors in several __init__ files that are updated

Fchl merge (qmlcode#79)

c93619e

* Commit of all 'secret' FCHL operator code * Also adding the actual FCHL operator files * Updated init for alchemy.py * Updated import of alchemy module * Removed old file * Updated manual for FCHL functionality - still needs tutorial/examples pages

Symmetric Kernels (qmlcode#81)

1108f41

* Added symmetric kernels * Moved driver away from legacy ml directory

Added python interface to symmetric vector kernels (qmlcode#83)

df37c92

Removed pesky u-umlaut (qmlcode#89)

b634784

Temporary fix for issue qmlcode#90 (qmlcode#91)

6a489df

Made omp less memory intensive in acsf, including a fix to qmlcode#90 (…

d9f7cc8

…qmlcode#96)

Omp reduction bugfix (qmlcode#95)

e958a74

* Minor formatting changes. Removed omp from a loop that didn't seem safe. * Fixed omp issue that resulted in inconsistent sorting * Revert changes to test. Local builds pass but travis fail, but I assume that the changes made should be more robust

Changed linear fit to lasso regression (qmlcode#93)

4306b01

This was done to help fitting of underdetermined cases, such as when the chemical composition never or rarely changes (md snapshots etc). Also a slight restructure was done to avoid repeated code.

continue->cycle (qmlcode#92)

5da66ee

Fixed error and warnings caused by latest numpy 1.16.0

937f2bc

Fixed non-pythonic character causing errors in Python2

cd6e10a

Merge pull request qmlcode#99 from andersx/python2_bugs

9e40ec0

Fixed error and warnings caused by latest numpy 1.16.0 and character error

Kernel PCA (qmlcode#101)

25c8a71

* Added Kernel PCA and test for KPCA

Add ability to create compounds from file-like objects (qmlcode#103)

caf65f0

* Made Compound able to read file-like objects I added a six dependency to make life easier * Give Compound a name only if there is a filename

Fixed extra space before -lpthread flag (qmlcode#105)

ff910f4

Remove ase and dataprovider (qmlcode#108)

c576539

* removed old dataprovider and ASE requirements * Updated test for relocated compond

Removed all f-code requiring omp_lib from FCHL code (qmlcode#111)

2665f63

* merging latest version of development branch into

3ed7379

cqml development branch Merge remote-tracking branch 'upstream/develop' into dev_cqml

* updated CQML implementation to be compatible with latest

5a5f0e2

development version of qml-code

* updated equation reference in CQML implementation

2632a7b

(reflecting final version of teh CQML paper)

* Removed data files for CQML example and added README.cqml, which

bf1960a

documents, where to find the files.

Merge remote-tracking branch 'upstream/develop' into dev_cqml

d002291

Merged latest version of official develop branch into CQML branch of the CQML developement fork.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull request for CQML extension (2nd trial) #115

Pull request for CQML extension (2nd trial) #115

zaspel commented Jul 29, 2019

Pull request for CQML extension (2nd trial) #115

Are you sure you want to change the base?

Pull request for CQML extension (2nd trial) #115

Conversation

zaspel commented Jul 29, 2019