Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDFIO: Alleviate EDF single handle problem #1584

Merged
merged 4 commits into from
Oct 21, 2024

Conversation

h-mayorquin
Copy link
Contributor

This should close #1557

This PR moves the logic of opening the edf reader to the functions that extract the data (_get_analogsignal_chunk).

This avoids the problem of someone initializing the io (in a notebook for example) and then having a failure to initialize it again because the handle was left dangling. Moreover, it would allow parallel processing to work better (but not perfect as it might be concurrent access crashes).

The performance costs are minimal in this case because the io is really slow (I think this can be fixed but that's a matter for another PR). For example, I have this mid-sized file 7000 MiB, 276 channels, and around 20 minutes. It takes 10.8 seconds to make a reading of the whole data in master. With the new changes it takes ... 11.0 seconds:

image

The moral is that when the data extraction itself takes most of the time, the io.open instruction cost does not count.

As mentioned in #1557 the best way would be to handle this with mne library but this should be an improvement over the current state of affair where this io does not play nice with a lot of downstream usage patterns (spike interface re-opens the io to get streams for example which precludes get the streams and then opening the reader again for the extractor).

@h-mayorquin h-mayorquin marked this pull request as ready for review October 16, 2024 16:52
Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of annoying, but I think I would prefer symmetry. Either delete the close and only have _close_reader or make an open function that calls the _open_reader internally. Does that make sense?

@h-mayorquin
Copy link
Contributor Author

I would rather erase close but that is a public method that I did not add. Would break backwards compatibility:

def test_close(self):
filename = self.get_local_path("edf/edf+C.edf")
# Open file and close it again
io1 = EDFRawIO(filename)
io1.parse_header()
io1.close()
# Check that file was closed properly and can be opened again
io2 = EDFRawIO(filename)
io2.parse_header()
io2.close()

If you are OK with that I can do it. I think that is a good thing, when we fix this the close method will stop making sense. For that reason, I also think that adding an open function would be a bad idea.

That said, I changed the private method to be _open_reader instead of open.

@h-mayorquin
Copy link
Contributor Author

But I would like to deprecate the close method in another PR. I don't want this to derail the conversation when Sam reviews it.

neo/rawio/edfrawio.py Outdated Show resolved Hide resolved
@zm711
Copy link
Contributor

zm711 commented Oct 17, 2024

That all works for me we can think about that later.

Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>
@zm711 zm711 added this to the 0.13.4 milestone Oct 18, 2024
@zm711 zm711 changed the title Alleviate EDF single handle problem EDFIO: Alleviate EDF single handle problem Oct 18, 2024
@zm711
Copy link
Contributor

zm711 commented Oct 18, 2024

I just changed the title since I'm compiling release notes it just helps me to quickly see which IO it is. This isn't an official repo rule, just a help me :)


self._t_stop = self.edf_reader.datarecord_duration * self.edf_reader.datarecords_in_file
# use sample count of first signal in stream
self._stream_index_samples = {stream_index : self.edf_reader.getNSamples()[chidx][0] for stream_index, chidx in self.stream_idx_to_chidx.items()}
Copy link
Contributor

@zm711 zm711 Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is [0] coming from?

this whole block seems like a great idea, but I don't see the extra [0] in previous _get_signal_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def _get_signal_size(self, block_index, seg_index, stream_index):
chidx = self.stream_idx_to_chidx[stream_index][0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zm711 here is the zero. The idea is to use the first channel of the stream to get the samples.

@samuelgarcia
Copy link
Contributor

OK for me.

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool me too.

@zm711 zm711 merged commit e1a8e16 into NeuralEnsemble:master Oct 21, 2024
3 checks passed
@h-mayorquin h-mayorquin deleted the fix_edf_handle branch October 21, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EDF blocks access when io is opened.
3 participants