-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop JsonSectionPageMapper in Rust API #1224
Develop JsonSectionPageMapper in Rust API #1224
Conversation
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1224 +/- ##
===========================================
+ Coverage 80.10% 80.44% +0.34%
===========================================
Files 269 269
Lines 29915 29915
Branches 5850 5850
===========================================
+ Hits 23962 24066 +104
+ Misses 4617 4487 -130
- Partials 1336 1362 +26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
contents = parse_json(f.read()) | ||
if not {"categories", "items"} <= contents.keys(): | ||
): | ||
fpath = osp.join(context.root_path, annot_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I personally love to get access to context.root_path directly, but this is recommended to avoid during detection.
datumaro/src/datumaro/components/format_detection.py
Lines 178 to 179 in 4faaae5
Detectors should avoid using this property in favor of specific | |
requirement methods. |
Do you have any objection to use this during detection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That guide is because the methods provided by context
are using self._root_path
internally. For example,
datumaro/src/datumaro/components/format_detection.py
Lines 354 to 357 in 4faaae5
try: | |
if is_binary_file: | |
with open(osp.join(self._root_path, path), "rb") as f: | |
yield f |
However, JsonSectionPageMapper
Rust API has no information for root_path
. There is no choice but to inject the full path osp.join(context.root_path, annot_file)
to access the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, how about other formats with pure Python codes? I just want to hear your opinion.
Frankly speaking, I don't want to avoid the use of context.root_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great to me in overall. Especially, it is happy to reduce the total time consumption during detection. Thank you!
### Summary - Ticket no. 127586 - It comes from this feedback, #1224 (comment). ### How to test ```console $ cd rust $ cargo test ``` ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [x] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
…mant (#1229) ### Summary - Ticket no. 127136 ### How to test Refer to #1224 for details on how we obtained the following results. 1. Performance - Before ```console Duration for detecting Datumaro data format: 25784.5ms, format=datumaro ``` - After ```console Duration for detecting Datumaro data format: 5966.8ms, format=datumaro ``` 2. Memory usage - Before  - After  ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [x] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
- Ticket no. 127135 and 127136. - Develop `JsonSectionPageMapper` to construct page maps for top-level sections in a given JSON file. - Enhance `DatumaroImporter.detect()`'s performance by replacing JSON file parsing logic with the `JsonSectionPageMapper`. Our existing test will validate its functionality. For the performance comparison, please see the following. - Before ```python from datumaro.rust_api import JsonSectionPageMapper from time import time import datumaro as dm start = time() format = dm.Dataset.detect("ws_test/coco/datumaro") dt = 1000.0 * (time() - start) print(f"Duration for detecting Datumaro data format: {dt:.1f}ms, format={format}") ``` ```console Duration for detecting Datumaro data format: 25784.5ms, format=datumaro ``` - After ```python from datumaro.rust_api import JsonSectionPageMapper from time import time import datumaro as dm start = time() format = dm.Dataset.detect("ws_test/coco/datumaro") dt = 1000.0 * (time() - start) print(f"Duration for detecting Datumaro data format: {dt:.1f}ms, format={format}") ``` ```console Duration for detecting Datumaro data format: 17234.7ms, format=datumaro ``` It saves ~7 secs. <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [x] I have updated the license header for each file (see an example below). ```python ``` --------- Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
…mant (openvinotoolkit#1229) - Ticket no. 127136 Refer to openvinotoolkit#1224 for details on how we obtained the following results. 1. Performance - Before ```console Duration for detecting Datumaro data format: 25784.5ms, format=datumaro ``` - After ```console Duration for detecting Datumaro data format: 5966.8ms, format=datumaro ``` 2. Memory usage - Before  - After  <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [x] I have updated the license header for each file (see an example below). ```python ``` --------- Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
### Summary - Ticket no 128951 - Apply #1224 and #1229 changes to the releases/1.5.0 branch ### How to test Already tested in the previous PRs. ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Summary
JsonSectionPageMapper
to construct page maps for top-level sections in a given JSON file.DatumaroImporter.detect()
's performance by replacing JSON file parsing logic with theJsonSectionPageMapper
.How to test
Our existing test will validate its functionality. For the performance comparison, please see the following.
Duration for detecting Datumaro data format: 25784.5ms, format=datumaro
Duration for detecting Datumaro data format: 17234.7ms, format=datumaro
It saves ~7 secs.
Checklist
License
Feel free to contact the maintainers if that's a concern.