-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dm.Dataset index access #1247
Implement dm.Dataset index access #1247
Conversation
Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #1247 +/- ##
===========================================
+ Coverage 80.53% 80.54% +0.01%
===========================================
Files 270 270
Lines 30232 30260 +28
Branches 5898 5906 +8
===========================================
+ Hits 24348 24374 +26
- Misses 4506 4507 +1
- Partials 1378 1379 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>
Hi @itrushkin, Could we add a test for this feature about the scenarios where the size of dataset increases or decreases? For example, a) Scenario with
b) Scenario with transforms
|
Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>
Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Issue No: 126681
Summary
This PR introduces dataset random index access for the easy conversion from
dm.Dataset
totorch.Dataset
. The implementation requires an additional list property forDatasetItemStorage
class, which would cost additional O(n) memory. It doesn't store items but references to them (subset and ID).Concern:
I wonder whether it is possible to store and manage this list for indexing, removal, addition, and iteration to reduce memory consumption. I tried to combine the following changes:
_traversal_order
dictionary seems unnecessary as all data is already stored and managed in thedata
property via.put()
and.remove()
methods.__len__
,__iter__
, and other methods can rely solely ondata
. Optionally, an_order
property could enable O(1)__len__
implementation.data
values withNone
to mark removal, deleting the entire dictionary entry accurately reflects the dataset state and eliminates ambiguity.As a result, tests set
test_can_create_patch*
started to fail. It appears that items were not actually removed from the dataset, but merely marked as such. This suggests either the cache initialization or element remove operation needs adjusting for this scenario. Resolving this with assistance could significantly improvedm.Dataset
interaction.How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.