Add server config option to disable validation of outgoing data #1530

ml-evs · 2023-02-23T17:22:56Z

This PR adds the config option validate_api_response to the reference server, which is enabled by default. Disabling this will short-circuit the pydantic validation of outgoing data, which can be used to allow things like "X" in chemical formulae through the API. Currently there are no associated warnings raised in this case, but an intermediate setting could be added to do this (would still have the performance hit of validation, but this does not seem to be sizable).

The server tests are now run in the CI in both modes, but there are currently no tests that the lack of validation does indeed allow negative data through --- it does, and setting up tests for this negative case would be more effort than I can afford atm.

…server

codecov · 2023-02-26T22:41:02Z

Codecov Report

Merging #1530 (c1d9969) into master (c3ed95d) will increase coverage by 0.00%.
The diff coverage is 95.45%.

❗ Current head c1d9969 differs from pull request most recent head 6043131. Consider uploading reports for the commit 6043131 to get more accurate results

@@           Coverage Diff           @@
##           master    #1530   +/-   ##
=======================================
  Coverage   91.10%   91.10%           
=======================================
  Files          74       74           
  Lines        4519     4531   +12     
=======================================
+ Hits         4117     4128   +11     
- Misses        402      403    +1

Flag	Coverage Δ
project	`91.10% <95.45%> (+<0.01%)`	⬆️
validator	`90.99% <95.45%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
optimade/server/mappers/entries.py	`98.21% <ø> (-0.90%)`	⬇️
...made/server/entry_collections/entry_collections.py	`97.84% <90.00%> (+0.04%)`	⬆️
optimade/server/config.py	`93.61% <100.00%> (+0.06%)`	⬆️
optimade/server/routers/utils.py	`96.72% <100.00%> (+0.23%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

ml-evs · 2023-02-27T11:54:01Z

@JPBergsma I know you're busy with tutorials (too) so I think I'll double-check this, merge then release now so that @markus1978 can try it out.

JPBergsma · 2023-02-28T08:23:25Z

optimade/server/mappers/entries.py

    def deserialize(
        cls, results: Union[dict, Iterable[dict]]
    ) -> Union[List[EntryResource], EntryResource]:
+        """Converts the raw database entries for this class into serialized models,
+        mapping the data along the way.
+
+        """
        if isinstance(results, dict):
            return cls.ENTRY_RESOURCE_CLASS(**cls.map_back(results))


I think these lines are no longer needed for our implementation, now we always pass a list.

Suggested change

def deserialize(

cls, results: Union[dict, Iterable[dict]]

) -> Union[List[EntryResource], EntryResource]:

"""Converts the raw database entries for this class into serialized models,

mapping the data along the way.

"""

if isinstance(results, dict):

return cls.ENTRY_RESOURCE_CLASS(**cls.map_back(results))

def deserialize(

cls, results: Iterable[dict]

) -> List[EntryResource]:

"""Converts the raw database entries for this class into serialized models,

mapping the data along the way.

"""

JPBergsma

I still had a few remarks about this PR. Other that, it looks like a good change to me.

JPBergsma · 2023-02-28T09:24:18Z

optimade/server/routers/utils.py

+        try:
+            new_entry = new_entry.dict(exclude_unset=True, by_alias=True)  # type: ignore[union-attr]
+        except AttributeError:
+            pass


You should not use try and except here. Handling an exception is very slow. So you should only use it when failure is rare (< 1%).
When validation is turned off the new_entry is however always a dictionary, so failure is not rare.
It is therefore better to do:

Suggested change

try:

new_entry = new_entry.dict(exclude_unset=True, by_alias=True) # type: ignore[union-attr]

except AttributeError:

pass

if not isinstance(new_entry, dict):

new_entry = new_entry.dict(exclude_unset=True, by_alias=True) # type: ignore[union-attr]

I'm not sure this is so clear-cut; I just made an artificial benchmark with a very simple pydantic model with exception handling and isinstance checks.

If you use exception handling then the validate_api_response: true (our default) branch is about 1% faster using exceptions than not, and the isinstance check is about 2% faster when you are passing raw dictionaries, i.e., not much changes. This is also dwarfed by the difference between using dicts vs pydantic models, which is a factor of 20x.

I would rather avoid slowing down the "slower" method, i.e., using exception handling by default.

If performance is important, the database will probably turn off validation to speed things up.
If performance is less important, the database will use a slower method and leave the validation on.
So I would argue, it would be the best to make the fastest method as fast as possible.

Perhaps, though I'm not convinced that disabling validation provides any meaningful performance boost, and instead is just used to bypass some of the strict rules we have on databases where the effort is too much to apply them (e.g., NOMAD uses "X" in like 10 out of millions of chemical formulae, and trying to query them with validation on causes crashes).

I just did a quick try on my laptop with the test data, and it takes 25% longer to process the ~~response~~ request with validation. So it is not a huge performance increase, but definitively noticeable.

Wow, really? I tried via the validator and could only get 1-2% difference. I'll re-investigate if I get time.

I meant that the total processing time of a request increases by 25% if I do the validation, compared to not validating.

I just did some more testing and it seems that the try except block takes about 1.5 times longer to execute than the "if" statement. Using "if" saves about 2.25 µs per entry. This is smaller than what I had expected. So for our example server we would only save 40 µs on 0.2 s so only 0.02%.
So it is probably not worth continuing this discussion.

optimade/server/routers/utils.py

ml-evs added 2 commits February 23, 2023 17:04

Add server config option to disable validation on outgoing data from …

7122de0

…server

Add CI tests for disabled API validation

c1d9969

ml-evs marked this pull request as ready for review February 26, 2023 22:31

ml-evs requested review from CasperWA and JPBergsma as code owners February 26, 2023 22:31

Placate mypy

6043131

ml-evs merged commit a22cddb into master Feb 27, 2023

ml-evs deleted the ml-evs/add_validation_shortcut branch February 27, 2023 11:56

ml-evs added enhancement New feature or request server Issues pertaining to the example server implementation labels Feb 27, 2023

JPBergsma reviewed Feb 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add server config option to disable validation of outgoing data #1530

Add server config option to disable validation of outgoing data #1530

ml-evs commented Feb 23, 2023 •

edited

Loading

codecov bot commented Feb 26, 2023 •

edited

Loading

ml-evs commented Feb 27, 2023

JPBergsma Feb 28, 2023

JPBergsma left a comment •

edited

Loading

JPBergsma Feb 28, 2023

ml-evs Mar 3, 2023

JPBergsma Mar 6, 2023

ml-evs Mar 6, 2023

JPBergsma Mar 7, 2023 •

edited

Loading

ml-evs Mar 7, 2023

JPBergsma Mar 8, 2023

Add server config option to disable validation of outgoing data #1530

Add server config option to disable validation of outgoing data #1530

Conversation

ml-evs commented Feb 23, 2023 • edited Loading

codecov bot commented Feb 26, 2023 • edited Loading

Codecov Report

ml-evs commented Feb 27, 2023

JPBergsma Feb 28, 2023

Choose a reason for hiding this comment

JPBergsma left a comment • edited Loading

Choose a reason for hiding this comment

JPBergsma Feb 28, 2023

Choose a reason for hiding this comment

ml-evs Mar 3, 2023

Choose a reason for hiding this comment

JPBergsma Mar 6, 2023

Choose a reason for hiding this comment

ml-evs Mar 6, 2023

Choose a reason for hiding this comment

JPBergsma Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

ml-evs Mar 7, 2023

Choose a reason for hiding this comment

JPBergsma Mar 8, 2023

Choose a reason for hiding this comment

ml-evs commented Feb 23, 2023 •

edited

Loading

codecov bot commented Feb 26, 2023 •

edited

Loading

JPBergsma left a comment •

edited

Loading

JPBergsma Mar 7, 2023 •

edited

Loading