-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boost segmentation import performance #1261
Conversation
Are you going to fix the unit-test errors? It seems that the errors are not directly related to this PR. Your code change looks good to me. |
The unit tests are broken because of checking datumaro/src/datumaro/plugins/data_formats/cityscapes.py Lines 276 to 283 in 76fc941
From my investigation, this is not defined in the official cityscapes document https://www.cityscapes-dataset.com/dataset-overview/#class-definitions, while this is for support CVAT cityscapes format as described in https://opencv.github.io/cvat/docs/manual/advanced/formats/format-cityscapes/. |
How about marking skip or xfail for this test and investigate this further later? |
I have fixed unit tests within cityscapes. Please review again :) |
5cf5f74
to
3317081
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1261 +/- ##
========================================
Coverage 80.60% 80.60%
========================================
Files 270 270
Lines 30347 30350 +3
Branches 5904 5906 +2
========================================
+ Hits 24462 24465 +3
Misses 4504 4504
Partials 1381 1381
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
When analyzing the import performance for

cityscapes
andkaggle_image_mask
, I have checked that the most bottleneck isnp.unique
for parsing the unique class indices within each mask.Analysis before PR:
Analysis after PR:

Instead of parsing unique class indices within a mask, I have changed to use all class indices in a dataset.
As a result, the performance is 5 times faster.
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.