Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Benchmarks #62

Closed
2 tasks
dmclark17 opened this issue Sep 22, 2019 · 4 comments
Closed
2 tasks

Kernel Benchmarks #62

dmclark17 opened this issue Sep 22, 2019 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@dmclark17
Copy link
Contributor

dmclark17 commented Sep 22, 2019

Before trying to make any kernel optimizations, I thought it would be good to make a suite of benchmarks so we can easily and consistently measure any performance boosts.

I am thinking a two phase benchmark would work well.

  • Setup script with constructs kernel inputs (atomic environments and the required parameters) and write them to disk in a language agnostic form. This data can then be used by multiple implementations of the kernel.
  • Python/numba implementation test which reads in the data and times the computation of the kernels
@dmclark17 dmclark17 self-assigned this Sep 22, 2019
@dmclark17 dmclark17 added the enhancement New feature or request label Sep 22, 2019
@dmclark17
Copy link
Contributor Author

dmclark17 commented Sep 23, 2019

It might be good to use serialization methods mentioned in #16 and #64.

I'm not sure if it would be better to store structured data, or just the bare minimum in the form of arrays.

I'm thinking it may be easier for cross-language compatibility to use JSON for information and scalar parameters and hdf5 as the data format for bond arrays, cross bond dists, etc..

@stevetorr
Copy link
Contributor

I endorse the use of the serialization methods! There's a PR up now that allows for GPs to be serialized as JSON files so we could just pass the models around: #64

The hdf5 format could be useful for efficiency for large training data sets, but I think we can cross that bridge once we get there (such as if sparse GPs make huge training sets feasible to work with). JSON isn't the most efficient option, strictly, for storing big arrays, but they are implemented right now and allow for storing everything you need in one .json file (either for a collection of structures representing a trajectory, a set of environments processed from a structure, or an entire GP).

@jonpvandermause
Copy link
Collaborator

No reason we can't have both! hdf5 will be useful for the kernel acceleration that David is looking into now.

@stevetorr
Copy link
Contributor

stevetorr commented Sep 24, 2019

Sounds good! hdf5 isn't mission critical for me right now, so @dmclark17, feel free to assign yourself to #16 , or we can open a new issue for hdf5 serialization and you can format it in a way that makes the most sense with your benchmarks :)

@jonpvandermause jonpvandermause closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants