-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lazy loading of raw data for local files #48
Conversation
b40dddb
to
91e6348
Compare
This looks really good! Not entirely related, but I was wondering if there is some public attribute that reflects the status of the data (i.e. loaded vs. not loaded). Sometimes, it would also be useful to know the total size of the Also, I assume loaded vs. not loaded applies to all signals that are part of an Finally, and this is now least related to this PR (so let me know if you would like me to open a new issue), what is the public way to access the actual data? There's no such thing as |
Yes, this could be useful, I'm not sure how to best show it though, since (with this PR) lazy loading is done on a per-signal basis 🤔
Currently not, no... maybe this could be combined with your first suggestion, i.e. display the size of the loaded signals and the total size on disk? E.g.
With the implementation suggested in this PR, it could be anything in between as well.
Right, currently there's nothing like that. While this could be nice for recordings with uniform sampling frequencies (just return a 2d-array), I'm not sure how to best treat those with differing ones (a list of arrays? fail? ...?). What use case are you thinking about for this? (-> a new issue would be nice here, yes) |
I've created a new issue to discuss accessing the data array in #49. Regarding the other points, I think it would be nice not to overcomplicate the API. So you are saying it is currently possible to have some signals loaded in memory and some not, because each signal is treated as a separate On the other hand, when loading an EDF file, either no signals or all signals are loaded in memory, so there is no way to influence individual signals being loaded with |
Exactly, with this PR the array stays memory mapped until the data is accessed for the first time.
Definitely! Is the suggestion for the extended repr in my above comment more or less what you're thinking about here?
With the currently suggested implementation, this would be
EDIT: slicing operations would also require loading the data (thanks @cbrnr!):
|
Yes, this would be useful! I'd also expose this in an attribute for convenience. Plus, each underlying
What about slicing between seconds or annotations? In any case, if the current memory consumption is available, users will always be able to find out which operation loads the data! |
👍 Feel free to open a PR for that, ideally once this one is merged!
Right, thanks for pointing this out! I'll edit the above list. For non-annotation signals this could even be done without loading the data (as long as the desired slice only contains complete datarecords), though that would complicate the implementation a bit. |
Another question that I had was that if What I'm trying to say is that this is becoming a bit complicated already, given that the original intention of lazy loading was that people sometimes work with EDF headers only, so they don't need the data. Maybe a simpler option would be to add a separate function |
@cbrnr, let's move the discussion about memory consumption information into a new issue! |
Sure! I just thought that maybe the current implementation is not even necessary because it adds too much complexity... |
For the original use case described in #47 you're definitely right :D However, it's a nice way to also speed up working with only a (small) subset of signals. |
No description provided.