Pyxdf speedup #39

rstdm · 2019-03-20T09:59:18Z

Hello,
XDF does a great job in storing all the data created during our experiments. Sadly the pyxdf script isn’t that fast in loading those recording as we would wish it to be.
In the Matlab implementation you addressed this issue by implementing the crucial part in C. I admit that this would also be the best solution for the Python implementation (especially if there is already a C++ implementation) but I decided to give numpy a try.

To elaborate this a bit:
The xdf file format stores for every chunk both it’s size (in bytes) and the number of samples it contains. Because the length of a sample and the length of the chunk’s header are known the number of timestamps in that chunks can be calculated. This number is important as timestamps are optional and a missing timestamp alters the position (and thus the meaning) of all following bytes. Therefore I can’t make the struct-method parse the whole chunk with a pattern (because the pattern is broken by missing timestamps).
But if the number of timestamps and the number of samples are known I’m able to identify two special cases: 1) All samples are associated with a timestamp 2) no sample is associated with a timestamp. Both cases have in common that there are no nasty optional timestamps which allows me to come up with a pattern that can be processed by the struct-method. As this implementation allows numpy to be heavily used it is drastically faster than the old for-loop implementation.

I measured the effect of this improvement and noticed that I was able to load some of my files up to five times faster. But of course this depends heavily on the recording itself or – more precisely – on the chunk size and the number of stamped samples. But even with my worst “realistic” recordings I was still able to measure a performance gain of 10 to 20%.

Let me know if you are dissatisfied with my implementation or have any improvement suggestions.

(Note: This pull request is based on another pull request of mine. Therefore you probably should merge (or reject) that one first.)

tstenner · 2019-03-21T12:01:06Z

Python/pyxdf/pyxdf.py

@@ -180,6 +207,19 @@ def __init__(self, xml):
            self.srate = round(float(xml['info']['nominal_srate'][0]))
            # format string (int8, int16, int32, float32, double64, string)
            self.fmt = xml['info']['channel_format'][0]
+            self.numpy_fmt = None
+            if self.fmt == 'int8':


Why not a dict?

tstenner · 2019-03-21T12:04:35Z

Python/pyxdf/pyxdf.py

+                            index = nsamples - remaining_num_samples
+
+                            raw_chunk = f.read(chunksize)
+                            chunk_value_iterator = struct.iter_unpack(structfmt, raw_chunk)


np.fromfile should be a lot faster

tstenner · 2019-03-21T12:14:52Z

Shameless plug: you might want to join the discussion in #19.

I've also written a pybind11 wrapper for a C++ implementation of chunk 3 reading (depending on the file 4-15x faster), but I haven't had time to investigate how to package it. Let me know if you want to try it out.

rstdm added 2 commits March 20, 2019 09:20

Uses an appropriate numpy array type to store string values

543d5b1

Processes chunks with numpy where possible

695d535

rstdm mentioned this pull request Mar 20, 2019

Missing Mex files in version 1.13 #40

Open

tstenner reviewed Mar 21, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyxdf speedup #39

Pyxdf speedup #39

rstdm commented Mar 20, 2019

tstenner Mar 21, 2019

tstenner Mar 21, 2019

tstenner commented Mar 21, 2019

Pyxdf speedup #39

Are you sure you want to change the base?

Pyxdf speedup #39

Conversation

rstdm commented Mar 20, 2019

tstenner Mar 21, 2019

Choose a reason for hiding this comment

tstenner Mar 21, 2019

Choose a reason for hiding this comment

tstenner commented Mar 21, 2019