Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make relationship parsing to be more efficient through precomputation #743

Merged
merged 4 commits into from
Aug 17, 2023

Conversation

lumjjb
Copy link
Contributor

@lumjjb lumjjb commented Aug 2, 2023

I ran into long times parsing big SPDX documents that use the hasFiles shortcut fields.

Handling of hasFiles field results in the list of all relationships being iterated through for each package. This change moves the creation of existing_relationships_without_comments to the top level in parse_all_relationships.

Ideally, it would be nice to use a set for more optimal look-up (but that requires Relationship to be hashable, and probably would like a bit more input on how'd you think that would be better implemented).

before:
python3 src/spdx_tools/spdx/clitools/pyspdxtools.py -i   617.18s user 0.53s system 99% cpu 10:17.96 total
after:
python3 spdx_tools/spdx/clitools/pyspdxtools.py -i  2> /dev/null  316.08s user 0.46s system 99% cpu 5:17.73 total

Signed-off-by: Brandon Lum <lumjjb@gmail.com>
maxhbr

This comment was marked as outdated.

@maxhbr
Copy link
Member

maxhbr commented Aug 11, 2023

some related tests are failing:

 =========================== short test summary info ============================
FAILED tests/spdx/parser/jsonlikedict/test_relationship_parser.py::test_parse_has_files_without_duplicating_relationships[has_files0-existing_relationships0-contains_relationships0] - AssertionError: assert 1 == 0
 +  where 1 = len([Relationship(spdx_element_id='SPDXRef-Package', relationship_type=<RelationshipType.CONTAINS: 6>, related_spdx_element_id='SPDXRef-File1', comment=None)])
 +  and   0 = len([])
============ 1 failed, 1008 passed, 3 skipped, 4 warnings in 41.66s ============

you should be able to run these tests locally with pytest.

maxhbr
maxhbr previously requested changes Aug 11, 2023
Copy link
Member

@maxhbr maxhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are failing

Signed-off-by: Brandon Lum <lumjjb@gmail.com>
@maxhbr
Copy link
Member

maxhbr commented Aug 11, 2023

I triggered the CI

Signed-off-by: Brandon Lum <lumjjb@gmail.com>
@lumjjb
Copy link
Contributor Author

lumjjb commented Aug 11, 2023

Thanks! sorry for the multiple pushes, not super familiar with the tooling, just fixed the lint errors (i believe)

@maxhbr
Copy link
Member

maxhbr commented Aug 11, 2023

I am happy to re-trigger and to support you.

(Sadly, I am on my way to bed. Will trigger again tomorrow, if you made new pushes)

@maxhbr
Copy link
Member

maxhbr commented Aug 11, 2023

you can run the linting tests locally via:

$ isort src tests --check
$ black src tests --check
$ flake8 src tests

Signed-off-by: Brandon Lum <lumjjb@gmail.com>
@lumjjb
Copy link
Contributor Author

lumjjb commented Aug 11, 2023

I fixed some of the ones not caused by changes as drive-bys in this PR. Can't reproduce errors with the same tooling locally for some reason :|. So 🤞

@lumjjb
Copy link
Contributor Author

lumjjb commented Aug 14, 2023

looks like all green! PTAL Thanks!

Copy link
Collaborator

@armintaenzertng armintaenzertng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks :)

@armintaenzertng armintaenzertng dismissed maxhbr’s stale review August 17, 2023 14:24

tests have been fixed

@armintaenzertng
Copy link
Collaborator

I noticed that this already fixes #747

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants