Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use pydantic tagged unions to more quickly validate the result document #2307

Closed
williballenthin opened this issue Aug 20, 2024 · 1 comment · Fixed by #2439
Closed

use pydantic tagged unions to more quickly validate the result document #2307

williballenthin opened this issue Aug 20, 2024 · 1 comment · Fixed by #2439
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed performance Related to capa's performance

Comments

@williballenthin
Copy link
Collaborator

In our result document JSON document, we use unions in a few places, such as freeze.features.Feature:

Feature = Union[

image

I've learned that to validate incoming data against this union, pydantic will loop through each case and then use the best match. because there are lots of cases, pydantic does a lot of work.

we should be able to improve performance by using tagged unions. there's a bit of documentation here:
https://docs.pydantic.dev/latest/concepts/unions/#discriminated-unions

and a PyCon US '24 talk here: https://www.youtube.com/watch?v=Qvj5e9xtaSE

we should investigate using tagged unions and see if it improves performance of loading serialized result documents. granted, this isn't a common operation, but the data model would be more correct regardless.

@williballenthin williballenthin added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed performance Related to capa's performance labels Aug 20, 2024
@williballenthin
Copy link
Collaborator Author

I think this would do it:

Feature = Annotated[Union[
    OSFeature,
    ArchFeature,
    ...
    BasicBlockFeature,
], Field(discriminator='type')]

harshit-wadhwani added a commit to harshit-wadhwani/capa that referenced this issue Oct 4, 2024
mr-tz pushed a commit to harshit-wadhwani/capa that referenced this issue Dec 3, 2024
@mr-tz mr-tz mentioned this issue Dec 5, 2024
3 tasks
@mr-tz mr-tz closed this as completed in 28c0234 Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed performance Related to capa's performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant