Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
XDF does a great job in storing all the data created during our experiments. Sadly the pyxdf script isn’t that fast in loading those recording as we would wish it to be.
In the Matlab implementation you addressed this issue by implementing the crucial part in C. I admit that this would also be the best solution for the Python implementation (especially if there is already a C++ implementation) but I decided to give numpy a try.
To elaborate this a bit:
The xdf file format stores for every chunk both it’s size (in bytes) and the number of samples it contains. Because the length of a sample and the length of the chunk’s header are known the number of timestamps in that chunks can be calculated. This number is important as timestamps are optional and a missing timestamp alters the position (and thus the meaning) of all following bytes. Therefore I can’t make the struct-method parse the whole chunk with a pattern (because the pattern is broken by missing timestamps).
But if the number of timestamps and the number of samples are known I’m able to identify two special cases: 1) All samples are associated with a timestamp 2) no sample is associated with a timestamp. Both cases have in common that there are no nasty optional timestamps which allows me to come up with a pattern that can be processed by the struct-method. As this implementation allows numpy to be heavily used it is drastically faster than the old for-loop implementation.
I measured the effect of this improvement and noticed that I was able to load some of my files up to five times faster. But of course this depends heavily on the recording itself or – more precisely – on the chunk size and the number of stamped samples. But even with my worst “realistic” recordings I was still able to measure a performance gain of 10 to 20%.
Let me know if you are dissatisfied with my implementation or have any improvement suggestions.
(Note: This pull request is based on another pull request of mine. Therefore you probably should merge (or reject) that one first.)