Database caching #345
Replies: 3 comments 4 replies
-
I'm in the middle of learning a bunch about this! Looks like in the world of cheminformatics, companies take one of two strategies:
The first is close to what you're suggesting and I think would be a great short-term solution -- although more expensive and error prone (bc of another running service). I agree it's worth trying out and reading into though. The second is something I need to learn more about, but I think will end up being the best long-term solution for materials fingerprints. Right now in Simmate, the bottleneck is transferring fingerprint data to each worker process. Meanwhile, comparing fingerprints (cosine/euclidean distances) is super cheap and fast. So for molecular systems where the scale is 10's of millions of molecules and fingerprints, they do the following:
This eliminates the transferring of fingerprints to other computers -- and instead just has the database run fingerprint comparisons itself. This takes ~100ms for >30mil fingerprints, so it would work great for our application. The challenge here is just learning steps 3 and 4! It'd be a super short script (<200 lines) but would probably be really challenging for us because we don't know C++! If you ever run into a potential collaborator that codes in that language, it might be worth looping them in |
Beta Was this translation helpful? Give feedback.
-
Very interesting. I hadn't contemplated running it on the cloud server itself. It makes sense that this would be robust and fast. Can you envision other calculation tasks (beyond fingerprint comparison) where something like redis would be preferred? I guess the primarily disadvantages of a registered function are: None of those are terribly significant, and redis wouldn't address most of these issues. I could certainly look into hiring a c++ programmer if this approach looks best. It does sound promising. |
Beta Was this translation helpful? Give feedback.
-
@jacksund Are you familiar navigating the process of finding a developer and proposing the project to them? If you want to navigate this, I’ll pay for the developer. |
Beta Was this translation helpful? Give feedback.
-
I've been reading about memory caching. Redis looks complicated but memcached should be easy to set up and is invisible to a general user. It is naturally multithreaded. This might be worth thinking about when we get back to evolutionary algorithms and we are concerned about throughput on fingerprints.
Beta Was this translation helpful? Give feedback.
All reactions