-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REST API: make the profile configurable as request parameter #5054
REST API: make the profile configurable as request parameter #5054
Conversation
@giovannipizzi @ltalirz this is a quick mockup of have we could make the REST API capable of dynamically loading and changing profiles. I have tested this manually (a bit) and it seems to work. I tried switching from django to django, sqla to sqla, and from django to sqla (and vice versa). Before trying to write unit tests (and perhaps cleaning up the implementation) I would be curious to see if this would work for some of the materials cloud deployments. Would it be possible to try and change a MC dev enviroment that now is using one REST API instance per profile, to simply have a single one serve all profiles and provide the |
Codecov Report
@@ Coverage Diff @@
## main #5054 +/- ##
==========================================
+ Coverage 79.72% 80.22% +0.50%
==========================================
Files 532 515 -17
Lines 37860 36782 -1078
==========================================
- Hits 30181 29504 -677
+ Misses 7679 7278 -401
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Thanks @sphuber, seems super useful. Pinging @elsapassaro - is it easy to try this on some dev server of Materials Cloud, with 2 profiles? @sphuber - if I understand correctly, you now specify profile=... in the URL, right? I'm wondering if this is how we would use it in Materials Cloud (maybe yes? @elsapassaro , @ltalirz , feedback welcome). |
@giovannipizzi Indeed, the profile can now be defined as a query parameter in the URL. I don't really see another way how to do it, if the goal is for clients to specify a particular profile. I also think that having an option to restrict the profiles that can be exposed when launching the REST API is a good thing. I am not too familiar with it, but probably the best would be to do this through the config, which can then potentially be exposed through the CLI to be dynamically changed. However, before implementing those additional details I wanted to first check if the current approach satisfies the use case and whether it works robustly. I don't have an idea yet of the overhead of switching profiles under load and if we need (if we can even) optimize things here. |
Thanks a lot @sphuber for this!
Somehow I was still under the impression that switching from django to sqla was not safe... but in this case it is? On materials cloud, we currently run the wsgi daemon processes with several threads, see e.g. #4494 (comment) - what happens in that case? I agree that access to profiles should be configured via While I agree that enabling profile switching on a request level is the right way forward and that this would strongly reduce e.g. continuous RAM use on Materials Cloud, I imagine that profile switching can incur a significant wait time on the request level. |
@@ -483,6 +483,7 @@ def build_translator_parameters(self, field_list): | |||
extras = None | |||
extras_filter = None | |||
full_type = None | |||
profile = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_profile(None)
defaults to loading the default profile.
I think the correct default here would be the profile that was used in
verdi -p profile restapi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have kept this as is, because this is just the parsing of the query parameters and I don't think that should be bothered with determining the default profile. This is now instead handled in BaseResource.load_profile
which knows what default to use if not specified in the query parameters.
9193094
to
22162cc
Compare
22162cc
to
457f93d
Compare
It could still make sense to merge this, although I will refactor it first to use |
8d71fe1
to
ceab183
Compare
I have rebased the PR and now use the |
One question: have you considered what happens when multi-threading, and thread safety? |
ceab183
to
d9da4b9
Compare
Good question, I hadn't really thought about it. I take it the |
Indeed, I would say it is not. aiida-core/aiida/restapi/common/utils.py Lines 822 to 835 in ffedc8b
This is basically the only part of AiiDA still "hard-coded" to a backend. Note, sqlalchemy scoped sessions, simply use a since after #5330, there is now only aiida-core/aiida/manage/manager.py Lines 33 to 38 in ffedc8b
Although this would be slightly different from having a session as thread local, since you would also have an |
But for the current REST API it probably is not a problem though since none of its end-points allow to mutate state. They are |
a6ee4f3
to
7e1a787
Compare
Hi, I tested this a little bit on materials cloud. Two parts: 1. general usage Currently, we set up a separate WSGI for each aiida profile with a script containing
which are then served at corresponding URL path of apache (e.g. With this PR, this 2. benchmarking I used a simple script using I tested on profiles I accessed 50 uuids for each of these profiles (e.g. In all cases, the average single access time is the same for all the profiles. with the current (old) implemention, the average access time for one uuid is around 0.30 seconds. with the implementation of this PR, the average access time for one uuid
I guess this is something that needs consideration, as the worst case performance is 1.7 times slower. Regarding memory usage, currently the wsgi daemons are taking about 40 MB each (6th column), which maybe is not too bad:
Although maybe i'm missing something about this. Happy to run any other further tests, if needed. |
7e1a787
to
2f0ba2b
Compare
Hi @sphuber, I have to say that the small performance penalty I reported before is not really a statistical error, it clearly shows a 10-20% penalty. However, I guess this small slowdown is acceptable. This PR: Old implementation: Otherwise, everything seems to work well. I also tested that the switching works with |
2190f56
to
e43c0e8
Compare
@eimrek I am pretty sure I think I have found the source of the performance regression. The parsing of the query parameters was happening twice. I have refactored it and tested locally and at least for me the performance is similar to |
hi @sphuber, ran the scripts again, and indeed the performance seems the same now! (Although, for some reason, both benchmarks now give roughly 0.34-0.35 sec per request.) Good to merge from my side. |
To make this possible, after parsing the query string but before performing the request, the desired profile needs to be loaded. A new method `load_profile` is added to the `BaseResource` class. All methods that access the storage, such as the `get` methods, need to invoke this method before handing the request. The `load_profile` method will call `load_profile` with `allow_switch` set to True, in order to allow changing the profile if another had already been loaded. The profile that is loaded is determined from the `profile` query parameter specified in the request. If not specified, the profile will be taken that was specified in the `kwargs` of the resources constructor. Note that the parsing of the request path and query parameters had to be refactored a bit to prevent the parsing having to be performed twice, which would result in a performance regression. When the REST API is invoked through the `verdi` CLI, the profile specified by the `-p` option, or the default profile if not specified, is passed to the API, which will be passed to the resource constructors. This guarantees that if `profile` is not specified in the query parameters the profile with which `verdi restapi` was invoked will be loaded.
This global config option is set to `False` by default. When set to `True`, the REST API will allow requests to specify the profile and it will switch the loaded profile when necessary. If a request specifies the profile query parameter and profile switching is turned off, a 400 Bad Request response is returned.
e43c0e8
to
9e95849
Compare
Fixes #5052
To make this possible, after parsing the query string but before
performing the request, the desired profile needs to be loaded. If any
other profile is already loaded, that first needs to be properly
unloaded, properly unloading the database backend and the manager with
all it resources.