Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull request for CQML extension (2nd trial) #115

Open
wants to merge 89 commits into
base: develop
Choose a base branch
from

Conversation

zaspel
Copy link

@zaspel zaspel commented Jul 29, 2019

This pull request aims at adding the multi-fidelity learning approach called CQML, which is documented in the paper "Boosting quantum machine learning models with multi-level combination technique: Pople diagrams revisited":

the core part of the new code are the new files qml/models/cqml2d.py
and qml/models/cqml.py, which contain a 2d CQML implementation
and a 2d/3d/nd CQML implementation. The 2d implementation follows the
simplified idea presented in Section 3.4 in the paper, while the 2d/3d/nd
CQML implementation is documented in Section 3.5

examples/cqml2d_CI9.py is the code, which was used to create
2d CQML results on the CI9 data set in the paper

examples/cqml_QM7b.py, examples/cqml_in_2d_*.py are the codes
that were used to create the 2d/3d results for the QM7b data set
in the paper

the CI9 and the QM7b data set are included in the examples directory

andersx and others added 30 commits March 1, 2018 15:40
* Added ARAS->FCHL code

* updated fchl kernels code

* Cleaned up FCHL code, parallelized weight ksi functions

* Factor 4 speed in three-body term

* Updated alchemy and speed

* Added global FCHL kernel

* Fixed initialization bug in FCHL global kernel

* Fixed clearing of self-dotprodicts, and removed excessive OMP memory use

* More parallelization issues fixed in FCHL

* Fixed 3-body parallelization, added atomic kernels, added option for no alchemy

* Fixed parallelization memory, added force kernels to FCHL

* Added two- and three-body exponenets as parameters

* Added alchemy module, added custom alchemy vectors.

* Updated parallelization etc.

* Fchl module (qmlcode#34)

* Updated to module and added screening function

* Fixed bug in cut-off function.

* Removed debug output from cut-off function.

* Added FCHL to develop branch
* option to calculate only subset of atoms in local representations (qmlcode#26)

* option to calculate only subset of atoms in local representations
* added reference calculation of representations
* finalized reference testing

* fixed error in test

* Atomic kernels memory (qmlcode#27)

* Added local kernels and tests for local kernels

* Slatm (qmlcode#28)

* Updated SLATM, removed ASE, updated docs, updated SLATM tests case

* Added F90 version of get_sbot in SLATM

* Added F2py implementation of get_sbop in slatm

* Fixed remaining issues with f90-slatm, works for global and local now.

* Added testcases for local SLATM

* Removed dead code from slatm.py

* Linear kernel (qmlcode#29)

* Added linear kernel and test

* Updated documentation for kernels and removed compiler warning (from frepresentations)

* Added global ARAD kernel and test

* Linalg (qmlcode#30)

* Added MKL discovery

* Updated cho_solve and cho_invert, so input is conserved

* Added busch-kaufman factorization solver and inversion (DSYTRS)

* Linear kernel (qmlcode#31)

* Added linear kernel and test

* Updated documentation for kernels and removed compiler warning (from frepresentations)

* Added global ARAD and linear global kernel

* Corrected ARAD global kernel to L2 distance

* Bob bug (qmlcode#33)

* Fixed ordering in Bob, added integration test

* Tightened threshold in Bob integration test to 2.8 kcal/mol

* Fchl main (qmlcode#37)

* Added ARAS->FCHL code

* updated fchl kernels code

* Cleaned up FCHL code, parallelized weight ksi functions

* Factor 4 speed in three-body term

* Updated alchemy and speed

* Added global FCHL kernel

* Fixed initialization bug in FCHL global kernel

* Fixed clearing of self-dotprodicts, and removed excessive OMP memory use

* More parallelization issues fixed in FCHL

* Fixed 3-body parallelization, added atomic kernels, added option for no alchemy

* Fixed parallelization memory, added force kernels to FCHL

* Added two- and three-body exponenets as parameters

* Added alchemy module, added custom alchemy vectors.

* Updated parallelization etc.

* Fchl module (qmlcode#34)

* Updated to module and added screening function

* Fixed bug in cut-off function.

* Removed debug output from cut-off function.

* Added FCHL to develop branch

* Updated clang->gcc in macos installation instructions, hattip geoff hutchison

* Updated autodeployment to GH pages and PyPI
* option to calculate only subset of atoms in local representations (qmlcode#26)

* option to calculate only subset of atoms in local representations
* added reference calculation of representations
* finalized reference testing

* fixed error in test

* Atomic kernels memory (qmlcode#27)

* Added local kernels and tests for local kernels

* Slatm (qmlcode#28)

* Updated SLATM, removed ASE, updated docs, updated SLATM tests case

* Added F90 version of get_sbot in SLATM

* Added F2py implementation of get_sbop in slatm

* Fixed remaining issues with f90-slatm, works for global and local now.

* Added testcases for local SLATM

* Removed dead code from slatm.py

* Linear kernel (qmlcode#29)

* Added linear kernel and test

* Updated documentation for kernels and removed compiler warning (from frepresentations)

* Added global ARAD kernel and test

* Linalg (qmlcode#30)

* Added MKL discovery

* Updated cho_solve and cho_invert, so input is conserved

* Added busch-kaufman factorization solver and inversion (DSYTRS)

* Linear kernel (qmlcode#31)

* Added linear kernel and test

* Updated documentation for kernels and removed compiler warning (from frepresentations)

* Added global ARAD and linear global kernel

* Corrected ARAD global kernel to L2 distance

* Bob bug (qmlcode#33)

* Fixed ordering in Bob, added integration test

* Tightened threshold in Bob integration test to 2.8 kcal/mol

* Fchl main (qmlcode#37)

* Added ARAS->FCHL code

* updated fchl kernels code

* Cleaned up FCHL code, parallelized weight ksi functions

* Factor 4 speed in three-body term

* Updated alchemy and speed

* Added global FCHL kernel

* Fixed initialization bug in FCHL global kernel

* Fixed clearing of self-dotprodicts, and removed excessive OMP memory use

* More parallelization issues fixed in FCHL

* Fixed 3-body parallelization, added atomic kernels, added option for no alchemy

* Fixed parallelization memory, added force kernels to FCHL

* Added two- and three-body exponenets as parameters

* Added alchemy module, added custom alchemy vectors.

* Updated parallelization etc.

* Fchl module (qmlcode#34)

* Updated to module and added screening function

* Fixed bug in cut-off function.

* Removed debug output from cut-off function.

* Added FCHL to develop branch

* Updated clang->gcc in macos installation instructions, hattip geoff hutchison

* Updated autodeployment to GH pages and PyPI

* Updated version number
* Updated FCHL documentation
* Updated FCHL documentation

* Fixed arad-> fchl in docs

* removed deprecated arad test from test_wrappers

* Updated travis yml to publish docs properly

* Updated increment version
* Updated travis yml to publish docs properly
* Fixed atomtype parsing bug
larsbratholm fixed bob bug, updated FORTRAN implementations
andersx and others added 30 commits August 15, 2018 11:31
* Reverts to old directory structure
* Fix authors in several __init__ files that are updated
* Commit of all 'secret' FCHL operator code

* Also adding the actual FCHL operator files

* Updated init for alchemy.py

* Updated import of alchemy module

* Removed old file

* Updated manual for FCHL functionality - still needs tutorial/examples pages
* Added symmetric kernels

* Moved driver away from legacy ml directory
* Made base representations

* started CM and data class

* Working on generate routine

* Working basic example

* Mostly hacked the searchcv routines to work

* Implementing atomic gaussian kernel

* working atomic krr

* Restructure and started global slatm

* Slatm

* Started acsf

* stash before merging acsf bugfix

* acsf bugfix cherrypick

* sigma='auto' option added to kernels

* Started fchl

* Working fchl

* Started preprocessing

* Mostly working atom scaler

* Made several attributes private

* Restructured how the data object is passed, to avoid possible memory issues

* Started alchemy in kernels

* Minor change to kernel alchemy

* Working feature trick in kernels

* Cleaned up code

* daily

* Finished examples
* Corrected small bug in predict function

* Started updating so that model can be trained after its been reloaded

* Minor modifications

* Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions

* Fix for the problem of shuffling

* Added some tests to make sure the predictions work

* Fixed a tensorboard problem

* The saving of the model doesn't cause an error if the directory already exists

* Fixed a bug that made a test fail

* Modified the name of a parameter

* Made modifications to make te symmetry functions more numerically stable

* Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set.

* corrected bug in score function for padded molecules

* Changes that make the model work quickly even when there is padding.

* Fixed discrepancies between fortran and TF acsf

* Corrected bug in setting of ACSF parameters

* Attempt at fixing issue qmlcode#10

* another attempt at fixing qmlcode#10

* Removed a pointless line

* set-up

* Added the graceful killer

* Modifications which prevent installation from breaking on BC4

* Modification to add neural networks to qmlearn

* Fix for issue qmlcode#8

* Random comment

* Started including the atomic model

* Made the atomic neural network work

* Fixed a bug with the indices

* Now training and predictions don't use the default graph, to avoid problems

* uncommented examples

* Removed unique_elements in data class

This can be stored in the NN class, but I might reverse the change later

* Made tensorflow an optional dependency

The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one.

* Made is_numeric non-private and removed legacy code

* Added 1d array util function

* Removed QML check and moved functions from utils to tf_utils

* Support for linear models (no hidden layers)

* fixed import bug in tf_utils

* Added text to explain that you are scoring on training set

* Restructure.

But elements are still not working
Sorted elements

* Moved documentation from init to class

* Constant features will now be removed at fit/predict time

* Moved get_batch_size back into utils, since it doesn't depend on tf

* Made the NeuralNetwork class compliant with sklearn

Cannot be any transforms of the input data

* Fixed tests that didn't pass

* Fixed mistake in checks of set_classes() in ARMP

* started fixing ARMP bugs for QM7

* Fixed bug in padding and added examples that give low errors

* Attempted fix to make representations single precision

* Hot fix for AtomScaler

* Minor bug fixes

* More bug fixes to make sure tests run

* Fixed some tests that had failures

* Reverted the fchl tests to original

* Fixed path in acsf test

* Readded changes to tests

* Modifications after code review
	which is documented in the paper
	"Boosting quantum machine learning models with multi-level
	 combination technique: Pople diagrams revisited"

	* the core part of the new code are the new files qml/models/cqml2d.py
	  and qml/models/cqml.py, which contain a 2d CQML implementation
	  and a 2d/3d/nd CQML implementation. The 2d implementation follows the
	  simplified idea presented in Section 3.4 in the paper, while the 2d/3d/nd
	  CQML implementation is documented in Section 3.5

	* examples/cqml2d_CI9.py is the code, which was used to create
	  2d CQML results on the CI9 data set in the paper

	* examples/cqml_QM7b.py, examples/cqml_in_2d_*.py are the codes
	  that were used to create the 2d/3d results for the QM7b data set
	  in the paper

	* the CI9 and the QM7b data set are included in the examples directory
* Minor formatting changes. Removed omp from a loop that didn't seem safe.

* Fixed omp issue that resulted in inconsistent sorting

* Revert changes to test.

Local builds pass but travis fail, but I assume that the changes made should be more robust
This was done to help fitting of underdetermined cases, such as when the chemical composition never or rarely changes (md snapshots etc).
Also a slight restructure was done to avoid repeated code.
* Fixed bug in gaussian symmetric vector kernels, where I had called the non-symmetric kernel by accident

* Removed legacy code. Closes qmlcode#86

* Fixed a minor bug/feature in examples

* Removed call to non-existing function in fchl electric field test

* Fixed an error in how fchl kernels was called. Updated kernels to reuse more code from other submodules

* Removed print used for debugging

* Fixed minor bug in FCHL when sigma='auto'

* Changed scaling/power to be positive instead of negative
* Corrected small bug in predict function

* Started updating so that model can be trained after its been reloaded

* Minor modifications

* Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions

* Fix for the problem of shuffling

* Added some tests to make sure the predictions work

* Fixed a tensorboard problem

* The saving of the model doesn't cause an error if the directory already exists

* Fixed a bug that made a test fail

* Modified the name of a parameter

* Made modifications to make te symmetry functions more numerically stable

* Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set.

* corrected bug in score function for padded molecules

* Changes that make the model work quickly even when there is padding.

* Fixed discrepancies between fortran and TF acsf

* Corrected bug in setting of ACSF parameters

* Attempt at fixing issue qmlcode#10

* another attempt at fixing qmlcode#10

* Removed a pointless line

* set-up

* Added the graceful killer

* Modifications which prevent installation from breaking on BC4

* Modification to add neural networks to qmlearn

* Fix for issue qmlcode#8

* Random comment

* Started including the atomic model

* Made the atomic neural network work

* Fixed a bug with the indices

* Now training and predictions don't use the default graph, to avoid problems

* uncommented examples

* Removed unique_elements in data class

This can be stored in the NN class, but I might reverse the change later

* Made tensorflow an optional dependency

The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one.

* Made is_numeric non-private and removed legacy code

* Added 1d array util function

* Removed QML check and moved functions from utils to tf_utils

* Support for linear models (no hidden layers)

* fixed import bug in tf_utils

* Added text to explain that you are scoring on training set

* Restructure.

But elements are still not working
Sorted elements

* Moved documentation from init to class

* Constant features will now be removed at fit/predict time

* Moved get_batch_size back into utils, since it doesn't depend on tf

* Made the NeuralNetwork class compliant with sklearn

Cannot be any transforms of the input data

* Fixed tests that didn't pass

* Fixed mistake in checks of set_classes() in ARMP

* started fixing ARMP bugs for QM7

* Fixed bug in padding and added examples that give low errors

* Attempted fix to make representations single precision

* Hot fix for AtomScaler

* Minor bug fixes

* More bug fixes to make sure tests run

* Fixed some tests that had failures

* Reverted the fchl tests to original

* Fixed path in acsf test

* Readded changes to tests

* Modifications after code review

* Version with the ACSF basis functions starting at 0.8 A

* Updated ACSF representations so that the minimum distance at which to start the binning can be set by the user

* Modified the name of the new parameter (minimum distance of the binning in ACSF)
Fixed error and warnings caused by latest numpy 1.16.0 and character error
* Added Kernel PCA and test for KPCA
* Made Compound able to read file-like objects

I added a six dependency to make life easier

* Give Compound a name only if there is a filename
* Corrected small bug in predict function

* Started updating so that model can be trained after its been reloaded

* Minor modifications

* Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions

* Fix for the problem of shuffling

* Added some tests to make sure the predictions work

* Fixed a tensorboard problem

* The saving of the model doesn't cause an error if the directory already exists

* Fixed a bug that made a test fail

* Modified the name of a parameter

* Made modifications to make te symmetry functions more numerically stable

* Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set.

* corrected bug in score function for padded molecules

* Changes that make the model work quickly even when there is padding.

* Fixed discrepancies between fortran and TF acsf

* Corrected bug in setting of ACSF parameters

* Attempt at fixing issue qmlcode#10

* another attempt at fixing qmlcode#10

* Removed a pointless line

* set-up

* Added the graceful killer

* Modifications which prevent installation from breaking on BC4

* Modification to add neural networks to qmlearn

* Fix for issue qmlcode#8

* Random comment

* Started including the atomic model

* Made the atomic neural network work

* Fixed a bug with the indices

* Now training and predictions don't use the default graph, to avoid problems

* uncommented examples

* Removed unique_elements in data class

This can be stored in the NN class, but I might reverse the change later

* Made tensorflow an optional dependency

The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one.

* Made is_numeric non-private and removed legacy code

* Added 1d array util function

* Removed QML check and moved functions from utils to tf_utils

* Support for linear models (no hidden layers)

* fixed import bug in tf_utils

* Added text to explain that you are scoring on training set

* Restructure.

But elements are still not working
Sorted elements

* Moved documentation from init to class

* Constant features will now be removed at fit/predict time

* Moved get_batch_size back into utils, since it doesn't depend on tf

* Made the NeuralNetwork class compliant with sklearn

Cannot be any transforms of the input data

* Fixed tests that didn't pass

* Fixed mistake in checks of set_classes() in ARMP

* started fixing ARMP bugs for QM7

* Fixed bug in padding and added examples that give low errors

* Attempted fix to make representations single precision

* Hot fix for AtomScaler

* Minor bug fixes

* More bug fixes to make sure tests run

* Fixed some tests that had failures

* Reverted the fchl tests to original

* Fixed path in acsf test

* Readded changes to tests

* Modifications after code review

* Version with the ACSF basis functions starting at 0.8 A

* Updated ACSF representations so that the minimum distance at which to start the binning can be set by the user

* Modified the name of the new parameter (minimum distance of the binning in ACSF)

* Added a function to the atomscaler that enables to revert back

* Relaxed tolerance in tests
* removed old dataprovider and ASE requirements

* Updated test for relocated compond
* Corrected small bug in predict function

* Started updating so that model can be trained after its been reloaded

* Minor modifications

* Updated model so one can predict from xyz and disabled shuffling in training because it leads to a problem with predictions

* Fix for the problem of shuffling

* Added some tests to make sure the predictions work

* Fixed a tensorboard problem

* The saving of the model doesn't cause an error if the directory already exists

* Fixed a bug that made a test fail

* Modified the name of a parameter

* Made modifications to make te symmetry functions more numerically stable

* Added a hack that makes ARMP work with fortran ACSF when there are padded representations. Currently works *ONLY* when there is one molecule for the whole data set.

* corrected bug in score function for padded molecules

* Changes that make the model work quickly even when there is padding.

* Fixed discrepancies between fortran and TF acsf

* Corrected bug in setting of ACSF parameters

* Attempt at fixing issue qmlcode#10

* another attempt at fixing qmlcode#10

* Removed a pointless line

* set-up

* Added the graceful killer

* Modifications which prevent installation from breaking on BC4

* Modification to add neural networks to qmlearn

* Fix for issue qmlcode#8

* Random comment

* Started including the atomic model

* Made the atomic neural network work

* Fixed a bug with the indices

* Now training and predictions don't use the default graph, to avoid problems

* uncommented examples

* Removed unique_elements in data class

This can be stored in the NN class, but I might reverse the change later

* Made tensorflow an optional dependency

The reason for this approach is that pip would just auto install tensorflow and you might want the gpu version or your own compiled one.

* Made is_numeric non-private and removed legacy code

* Added 1d array util function

* Removed QML check and moved functions from utils to tf_utils

* Support for linear models (no hidden layers)

* fixed import bug in tf_utils

* Added text to explain that you are scoring on training set

* Restructure.

But elements are still not working
Sorted elements

* Moved documentation from init to class

* Constant features will now be removed at fit/predict time

* Moved get_batch_size back into utils, since it doesn't depend on tf

* Made the NeuralNetwork class compliant with sklearn

Cannot be any transforms of the input data

* Fixed tests that didn't pass

* Fixed mistake in checks of set_classes() in ARMP

* started fixing ARMP bugs for QM7

* Fixed bug in padding and added examples that give low errors

* Attempted fix to make representations single precision

* Hot fix for AtomScaler

* Minor bug fixes

* More bug fixes to make sure tests run

* Fixed some tests that had failures

* Reverted the fchl tests to original

* Fixed path in acsf test

* Readded changes to tests

* Modifications after code review

* Version with the ACSF basis functions starting at 0.8 A

* Updated ACSF representations so that the minimum distance at which to start the binning can be set by the user

* Modified the name of the new parameter (minimum distance of the binning in ACSF)

* Added a function to the atomscaler that enables to revert back

* Relaxed tolerance in tests

* Fixed bug in the padding of the representation in the ARMP network used in the pipeline

* Made a modification to how the Fortran ACSF are generated that helps with how much memory is used. Currrently only float32 ACSF are available

* Added a check to make sure there are no NANs in the representations.

* Small mistake corrected in aglaia

* Fixed extra space before -lpthread flag

* Removed what I added

* Implemented MRMP representations from xyz

* Generate atomic slatm from data

* Fixed typo

* Fixed problem with slatm and ARMP

* Fixed bug for MRMP tensorboard logger

* Actually fixed the tensorboard bug for MRMP and added tests to catch future errors

* Fixed another tensorboard bug

* Changed the behaviour of logging to tensorboard in MRMP
  cqml development branch
Merge remote-tracking branch 'upstream/develop' into dev_cqml
  (reflecting final version of teh CQML paper)
Merged latest version of official develop branch into CQML branch
of the CQML developement fork.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants