use pydantic tagged unions to more quickly validate the result document #2307
Labels
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
performance
Related to capa's performance
In our result document JSON document, we use unions in a few places, such as
freeze.features.Feature
:capa/capa/features/freeze/features.py
Line 351 in c409b2b
I've learned that to validate incoming data against this union, pydantic will loop through each case and then use the best match. because there are lots of cases, pydantic does a lot of work.
we should be able to improve performance by using tagged unions. there's a bit of documentation here:
https://docs.pydantic.dev/latest/concepts/unions/#discriminated-unions
and a PyCon US '24 talk here: https://www.youtube.com/watch?v=Qvj5e9xtaSE
we should investigate using tagged unions and see if it improves performance of loading serialized result documents. granted, this isn't a common operation, but the data model would be more correct regardless.
The text was updated successfully, but these errors were encountered: