REST API: make the profile configurable as request parameter #5054

sphuber · 2021-08-08T20:43:14Z

To make this possible, after parsing the query string but before
performing the request, the desired profile needs to be loaded. If any
other profile is already loaded, that first needs to be properly
unloaded, properly unloading the database backend and the manager with
all it resources.

sphuber · 2021-08-08T20:45:48Z

@giovannipizzi @ltalirz this is a quick mockup of have we could make the REST API capable of dynamically loading and changing profiles. I have tested this manually (a bit) and it seems to work. I tried switching from django to django, sqla to sqla, and from django to sqla (and vice versa). Before trying to write unit tests (and perhaps cleaning up the implementation) I would be curious to see if this would work for some of the materials cloud deployments. Would it be possible to try and change a MC dev enviroment that now is using one REST API instance per profile, to simply have a single one serve all profiles and provide the profile query parameter in requests?

codecov · 2021-08-08T21:01:37Z

Codecov Report

Merging #5054 (9193094) into main (974ace2) will increase coverage by 0.50%.
The diff coverage is 64.29%.

❗ Current head 9193094 differs from pull request most recent head 2190f56. Consider uploading reports for the commit 2190f56 to get more accurate results

@@            Coverage Diff             @@
##             main    #5054      +/-   ##
==========================================
+ Coverage   79.72%   80.22%   +0.50%     
==========================================
  Files         532      515      -17     
  Lines       37860    36782    -1078     
==========================================
- Hits        30181    29504     -677     
+ Misses       7679     7278     -401

Flag	Coverage Δ
django	`74.71% <64.29%> (?)`
sqlalchemy	`73.61% <64.29%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
aiida/restapi/common/utils.py	`78.31% <60.00%> (-1.79%)`	⬇️
aiida/restapi/resources.py	`95.24% <66.67%> (-2.01%)`	⬇️
aiida/engine/exceptions.py	`0.00% <0.00%> (-100.00%)`	⬇️
aiida/cmdline/commands/cmd_setup.py	`50.00% <0.00%> (-44.89%)`	⬇️
aiida/orm/nodes/data/array/xy.py	`16.18% <0.00%> (-43.82%)`	⬇️
aiida/cmdline/params/options/commands/setup.py	`56.85% <0.00%> (-36.15%)`	⬇️
aiida/orm/nodes/data/array/projection.py	`14.76% <0.00%> (-33.63%)`	⬇️
aiida/cmdline/commands/cmd_database.py	`62.41% <0.00%> (-31.87%)`	⬇️
aiida/orm/implementation/computers.py	`68.43% <0.00%> (-31.57%)`	⬇️
aiida/manage/external/postgres.py	`60.53% <0.00%> (-22.80%)`	⬇️
... and 690 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 974ace2...2190f56. Read the comment docs.

giovannipizzi · 2021-08-09T15:51:16Z

Thanks @sphuber, seems super useful. Pinging @elsapassaro - is it easy to try this on some dev server of Materials Cloud, with 2 profiles?

@sphuber - if I understand correctly, you now specify profile=... in the URL, right? I'm wondering if this is how we would use it in Materials Cloud (maybe yes? @elsapassaro , @ltalirz , feedback welcome).
In any case, it would be good to have a way (for Materials Cloud but also in general) to limit which profiles are actually exposed on the REST API. E.g. either with some command line options when you start the REST API (e.g. you start only for the current one by default, and you can specify more profiles (or --all) in the command line verdi command). Or by having some flags in the config.json file (exposed_via_rest_api=true) or some combination of them (and we should pick good defaults, where user don't get unexpected behaviour - e.g. running the API and not seeing any profile is bad, but also running the API and exposing more than what they thought is equally not good IMO).
This might be useful both on Materials Cloud (e.g. to expose a subset of DBs we control) but also for users (who might want to give access to others to only one or a few of their profiles, but not to some of the others that might contain confidential data).

sphuber · 2021-08-10T08:33:19Z

@giovannipizzi Indeed, the profile can now be defined as a query parameter in the URL. I don't really see another way how to do it, if the goal is for clients to specify a particular profile. I also think that having an option to restrict the profiles that can be exposed when launching the REST API is a good thing. I am not too familiar with it, but probably the best would be to do this through the config, which can then potentially be exposed through the CLI to be dynamically changed. However, before implementing those additional details I wanted to first check if the current approach satisfies the use case and whether it works robustly. I don't have an idea yet of the overhead of switching profiles under load and if we need (if we can even) optimize things here.

ltalirz · 2021-08-10T08:43:04Z

Thanks a lot @sphuber for this!

I tried switching from django to django, sqla to sqla, and from django to sqla (and vice versa).

Somehow I was still under the impression that switching from django to sqla was not safe... but in this case it is?

On materials cloud, we currently run the wsgi daemon processes with several threads, see e.g. #4494 (comment) - what happens in that case?
Is there a specific thing @elsapassaro should test?

I agree that access to profiles should be configured via verdi config.

While I agree that enabling profile switching on a request level is the right way forward and that this would strongly reduce e.g. continuous RAM use on Materials Cloud, I imagine that profile switching can incur a significant wait time on the request level.
We should benchmark this to see whether we can live with this overhead.

ltalirz · 2021-08-10T08:28:17Z

aiida/restapi/common/utils.py

@@ -483,6 +483,7 @@ def build_translator_parameters(self, field_list):
        extras = None
        extras_filter = None
        full_type = None
+        profile = None


load_profile(None) defaults to loading the default profile.

I think the correct default here would be the profile that was used in
verdi -p profile restapi

I have kept this as is, because this is just the parsing of the query parameters and I don't think that should be bothered with determining the default profile. This is now instead handled in BaseResource.load_profile which knows what default to use if not specified in the query parameters.

sphuber · 2022-03-09T11:50:25Z

It could still make sense to merge this, although I will refactor it first to use allow_switch of load_profile that was added after the recent storage loading refactoring.

sphuber · 2022-03-09T14:47:01Z

I have rebased the PR and now use the allow_switch argument of load_profile that was recently added. I think it would be good to add this for v2.0 since it is a fully backwards-compatible change and could solve a concrete problem. @ltalirz do you have the time to give this a quick look or should I ask someone else?

chrisjsewell · 2022-03-09T14:58:48Z

One question: have you considered what happens when multi-threading, and thread safety?

sphuber · 2022-03-09T15:17:19Z

One question: have you considered what happens when multi-threading, and thread safety?

Good question, I hadn't really thought about it. I take it the allow_switch is not thread-safe (yet)? I think the REST API is being launched by default in threaded mode, but we could make the profile switching feature available for now at the cost of not using threads. This will be a tradeoff of course, but might still be interesting for certain use cases.

chrisjsewell · 2022-03-09T19:58:42Z

I take it the allow_switch is not thread-safe (yet)?

Indeed, I would say it is not.
Currently, the restapi threads are "abusing" the sqlalchemy scoped session, for thread local connections:

aiida-core/aiida/restapi/common/utils.py

Lines 822 to 835 in ffedc8b

    
           @decorator 
        
           def close_thread_connection(wrapped, _, args, kwargs): 
        
               """Close the profile's storage connection, for the current thread. 
        
               This decorator can be used for router endpoints. 
        
               It is needed due to the server running in threaded mode, i.e., creating a new thread for each incoming request, 
        
               and leaving connections unreleased. 
        
               Note, this is currently hard-coded to the `PsqlDosBackend` storage backend. 
        
               """ 
        
               try: 
        
                   return wrapped(*args, **kwargs) 
        
               finally: 
        
                   get_manager().get_profile_storage().get_session().close()

This is basically the only part of AiiDA still "hard-coded" to a backend.

Note, sqlalchemy scoped sessions, simply use a threading.local to create the session: https://github.com/sqlalchemy/sqlalchemy/blob/03989d1dce80999bb9ea1a7d36df3285e5ce4c3b/lib/sqlalchemy/util/_collections.py#L658

since after #5330, there is now only Manager and Config are global variables, I was thinking to potentially support threading via making thread local:

aiida-core/aiida/manage/manager.py

Lines 33 to 38 in ffedc8b

    
           def get_manager() -> 'Manager': 
        
               """Return the AiiDA global manager instance.""" 
        
               global MANAGER  # pylint: disable=global-statement 
        
               if MANAGER is None: 
        
                   MANAGER = Manager() 
        
               return MANAGER

Although this would be slightly different from having a session as thread local, since you would also have an sqlalchemy.Engine per thread

sphuber · 2022-03-09T22:22:00Z

But for the current REST API it probably is not a problem though since none of its end-points allow to mutate state. They are get operations only, so problems related to threading should be minimal if not existing, right?

eimrek · 2022-07-12T14:04:49Z

Hi, I tested this a little bit on materials cloud.

Two parts:

1. general usage

Currently, we set up a separate WSGI for each aiida profile with a script containing load_profile, e.g. this one:

# wsgi script for profile mc3d
from aiida.manage.configuration import load_profile
from aiida.restapi.run_api import configure_api
from aiida.restapi.api import App

load_profile('mc3d')

api = configure_api(flask_app=App)
application = api.app

which are then served at corresponding URL path of apache (e.g. /mc3d/)

With this PR, this load_profile ceases to work and always reverts to the default aiida profile. I would propose that it would be good to keep this working (i.e. by setting up the wsgi file this way, the corresponding profile is used).

2. benchmarking

I used a simple script using requests to benchmark this PR with the old implementation. (here's the notebook: mc-rest-benchmark.zip)

I tested on profiles pyrene-mofs (total of 56'832 nodes), mc3d (total of 535'005 nodes) and li-ion-conductors (total of 4'505'808 nodes).

I accessed 50 uuids for each of these profiles (e.g. /mc3d/api/v4/nodes/<uuid>) for both the current implementation and this PR.

In all cases, the average single access time is the same for all the profiles.

with the current (old) implemention, the average access time for one uuid is around 0.30 seconds.

with the implementation of this PR, the average access time for one uuid

if accessed sequentially the same profile: 0.34 seconds
if alternating between two profiles (doesn't matter which two): 0.51 seconds.

I guess this is something that needs consideration, as the worst case performance is 1.7 times slower.

Regarding memory usage, currently the wsgi daemons are taking about 40 MB each (6th column), which maybe is not too bad:

(aiida) ubuntu@dev-aiida-2:~/aiida-core$ ps aux | grep daemon-
ubuntu    221483  0.1  0.2 430120 44408 ?        Sl   15:51   0:00 daemon-mc3d       -k start
ubuntu    221484  0.1  0.2 429416 43504 ?        Sl   15:51   0:00 daemon-mc2d       -k start
ubuntu    221485  0.1  0.2 429416 43584 ?        Sl   15:51   0:00 daemon-2dtopo     -k start
ubuntu    221486  0.1  0.2 429416 43584 ?        Sl   15:51   0:00 daemon-tc-applica -k start
ubuntu    221487  0.1  0.2 429416 43588 ?        Sl   15:51   0:00 daemon-sssp       -k start
ubuntu    221488  0.1  0.2 430120 44404 ?        Sl   15:51   0:00 daemon-pyrene-mof -k start
ubuntu    221489  0.1  0.2 429416 43584 ?        Sl   15:51   0:00 daemon-curated-co -k start
ubuntu    221490  0.1  0.2 429416 43584 ?        Sl   15:51   0:00 daemon-hcofs-co2  -k start
ubuntu    221491  0.1  0.2 363880 43604 ?        Sl   15:51   0:00 daemon-stoceriait -k start
ubuntu    221492  0.1  0.2 363880 43596 ?        Sl   15:51   0:00 daemon-scdm       -k start
ubuntu    221493  0.1  0.2 429416 43600 ?        Sl   15:51   0:00 daemon-tin-antimo -k start
ubuntu    221494  0.1  0.2 429416 43596 ?        Sl   15:51   0:00 daemon-optimade-s -k start
ubuntu    221495  0.1  0.2 429416 43600 ?        Sl   15:51   0:00 daemon-li-ion-con -k start

Although maybe i'm missing something about this.

Happy to run any other further tests, if needed.

eimrek · 2022-07-14T09:52:19Z

Hi @sphuber,

I have to say that the small performance penalty I reported before is not really a statistical error, it clearly shows a 10-20% penalty. However, I guess this small slowdown is acceptable.

This PR:

Old implementation:

Otherwise, everything seems to work well. I also tested that the switching works with verdi config set rest_api.profile_switching True and so it's easy to enable to this whenever needed.

sphuber · 2022-07-14T18:39:36Z

@eimrek I am pretty sure I think I have found the source of the performance regression. The parsing of the query parameters was happening twice. I have refactored it and tested locally and at least for me the performance is similar to main now. Would you like to perform one final benchmark?

eimrek · 2022-07-15T09:05:10Z

hi @sphuber, ran the scripts again, and indeed the performance seems the same now! (Although, for some reason, both benchmarks now give roughly 0.34-0.35 sec per request.)

Good to merge from my side.

To make this possible, after parsing the query string but before performing the request, the desired profile needs to be loaded. A new method `load_profile` is added to the `BaseResource` class. All methods that access the storage, such as the `get` methods, need to invoke this method before handing the request. The `load_profile` method will call `load_profile` with `allow_switch` set to True, in order to allow changing the profile if another had already been loaded. The profile that is loaded is determined from the `profile` query parameter specified in the request. If not specified, the profile will be taken that was specified in the `kwargs` of the resources constructor. Note that the parsing of the request path and query parameters had to be refactored a bit to prevent the parsing having to be performed twice, which would result in a performance regression. When the REST API is invoked through the `verdi` CLI, the profile specified by the `-p` option, or the default profile if not specified, is passed to the API, which will be passed to the resource constructors. This guarantees that if `profile` is not specified in the query parameters the profile with which `verdi restapi` was invoked will be loaded.

This global config option is set to `False` by default. When set to `True`, the REST API will allow requests to specify the profile and it will switch the loaded profile when necessary. If a request specifies the profile query parameter and profile switching is turned off, a 400 Bad Request response is returned.

ltalirz reviewed Aug 10, 2021

View reviewed changes

sphuber force-pushed the feature/5052/restapi-profile-parameter branch from 9193094 to 22162cc Compare September 2, 2021 12:32

sphuber mentioned this pull request Sep 3, 2021

close django connection in REST API #4741

Closed

sphuber force-pushed the feature/5052/restapi-profile-parameter branch from 22162cc to 457f93d Compare September 14, 2021 07:13

sphuber force-pushed the feature/5052/restapi-profile-parameter branch 3 times, most recently from 8d71fe1 to ceab183 Compare March 9, 2022 14:43

sphuber requested a review from ltalirz March 9, 2022 14:45

sphuber force-pushed the feature/5052/restapi-profile-parameter branch from ceab183 to d9da4b9 Compare March 9, 2022 15:15

sphuber force-pushed the feature/5052/restapi-profile-parameter branch 2 times, most recently from a6ee4f3 to 7e1a787 Compare May 18, 2022 08:49

sphuber force-pushed the feature/5052/restapi-profile-parameter branch from 7e1a787 to 2f0ba2b Compare July 13, 2022 12:49

sphuber requested a review from eimrek July 13, 2022 15:21

eimrek previously approved these changes Jul 14, 2022

View reviewed changes

sphuber dismissed eimrek’s stale review via e43c0e8 July 14, 2022 16:25

sphuber force-pushed the feature/5052/restapi-profile-parameter branch from 2190f56 to e43c0e8 Compare July 14, 2022 16:25

sphuber added 2 commits July 15, 2022 11:19

sphuber force-pushed the feature/5052/restapi-profile-parameter branch from e43c0e8 to 9e95849 Compare July 15, 2022 09:45

sphuber mentioned this pull request Jul 15, 2022

REST API: Add profile black listing for profile switching enabled #5603

Open

sphuber merged commit c0fdf38 into aiidateam:main Jul 15, 2022

sphuber deleted the feature/5052/restapi-profile-parameter branch July 15, 2022 11:22

ltalirz mentioned this pull request Jul 15, 2022

REST API deployment: reconsider use of Process / Application groups #4494

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST API: make the profile configurable as request parameter #5054

REST API: make the profile configurable as request parameter #5054

sphuber commented Aug 8, 2021

sphuber commented Aug 8, 2021

codecov bot commented Aug 8, 2021 •

edited

Loading

giovannipizzi commented Aug 9, 2021

sphuber commented Aug 10, 2021

ltalirz commented Aug 10, 2021 •

edited

Loading

ltalirz Aug 10, 2021

sphuber Mar 9, 2022

sphuber commented Mar 9, 2022

sphuber commented Mar 9, 2022

chrisjsewell commented Mar 9, 2022

sphuber commented Mar 9, 2022

chrisjsewell commented Mar 9, 2022

sphuber commented Mar 9, 2022

eimrek commented Jul 12, 2022

eimrek commented Jul 14, 2022

sphuber commented Jul 14, 2022

eimrek commented Jul 15, 2022

REST API: make the profile configurable as request parameter #5054

REST API: make the profile configurable as request parameter #5054

Conversation

sphuber commented Aug 8, 2021

sphuber commented Aug 8, 2021

codecov bot commented Aug 8, 2021 • edited Loading

Codecov Report

giovannipizzi commented Aug 9, 2021

sphuber commented Aug 10, 2021

ltalirz commented Aug 10, 2021 • edited Loading

ltalirz Aug 10, 2021

Choose a reason for hiding this comment

sphuber Mar 9, 2022

Choose a reason for hiding this comment

sphuber commented Mar 9, 2022

sphuber commented Mar 9, 2022

chrisjsewell commented Mar 9, 2022

sphuber commented Mar 9, 2022

chrisjsewell commented Mar 9, 2022

sphuber commented Mar 9, 2022

eimrek commented Jul 12, 2022

eimrek commented Jul 14, 2022

sphuber commented Jul 14, 2022

eimrek commented Jul 15, 2022

codecov bot commented Aug 8, 2021 •

edited

Loading

ltalirz commented Aug 10, 2021 •

edited

Loading