Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use diskcache for caching ProtocolDAGResults in the Alchemiscale client #271

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ianmkenney
Copy link
Member

Addresses, in part, #58. Using the diskcache library, we can cache ProtocolDAGResults client-side and reduce the number of API calls for calculating free energy differences.

* The default Disk used by diskcache uses pickle when storing python
  objects. Instead, we are now storing byte arrays. Depending on the
  size of the byte array, this is either stored in the SQLite3 DB or
  or as a separate file if it's too large (>32 kb by default).
* A test has been added that checks the hits and misses when pulling
  PDRs using the get_transformation_results method. The in-memory
  LRU cache is cleared manually for accurate stats.
New objects supported:
* Transformations
* AlchemicalNetworks
* ChemicalSystems
* Generally anything that can be a KeyedChain
* With known cached results, corrupt the values and make sure the user
  is warned that there was a problem with deserialization and that a
  new result will be downloaded.
* Lowered the cache size limit for tests to avoid running out of space
@ianmkenney ianmkenney linked an issue Apr 30, 2024 that may be closed by this pull request
* Removed unsused imports
The AlchemiscaleBaseClient now determines the cache directory
when one is not specified directly (i.e. a None is provided to the
AlchemiscaleBaseClient constructor). When a path to this directory is
provided, it must be a string or pathlib.Path object. The logic for
this operation lies in the `AlchemiscaleBaseClient._determine_cache_dir`
method, which can raise a TypeError on invalid input.

The `cache_size_limit` is now verified within the constructor to be
>= 0. If it is not, then a ValueError is raised.

New tests have been added for the above changes:

* Negative cache_size_limit: checks for constructor-raised ValueError
  with a meaningful message.

* cache_directory is None: checks output of the underlying
  _determine_cache_dir method with and without the XDG_CACHE_HOME
  environment variable. If we test it with the client constructor, the
  directory is made automatically, which we don't want in the tests as
  it may touch real data.

* cache_directory is not None, str, or Path: Check that the constructor
  raises a TypeError with a meaningful message.
@ianmkenney
Copy link
Member Author

@dotsdl new error I haven't seen yet, but doesn't seem to touch anything I changed. Will keep an eye out for it again, but going to assume it's a race condition of some sort for now.

This should be ready for review now!

@ianmkenney ianmkenney marked this pull request as ready for review April 30, 2024 21:15
@ianmkenney ianmkenney changed the title [WIP] Use diskcache for caching ProtocolDAGResults in the Alchemiscale client Use diskcache for caching ProtocolDAGResults in the Alchemiscale client Apr 30, 2024
@ianmkenney ianmkenney requested a review from dotsdl April 30, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ProtocolDAGResult caching to user-facing client
2 participants