Skip to content

Conversation

@tillahoffmann
Copy link
Contributor

@tillahoffmann tillahoffmann commented Oct 3, 2017

This PR removes the necessity to seek in the input file when generating blocks because seeking is relatively expensive. Here is a benchmark on an ACDC snippet (25 seconds) and a metal podcast (32 mins, 15 seconds).

# profile.py
import soundfile
import tqdm
import sys

with soundfile.SoundFile(sys.argv[1]) as sf:
    for _ in tqdm.tqdm(sf.blocks(6 * sf.samplerate, sf.samplerate)):
        pass
$ python profile.py ACDC_-_Back_In_Black-sample.ogg
5it [00:00, 20.51it/s]  # on master
$ python profile.py ACDC_-_Back_In_Black-sample.ogg
5it [00:00, 30.96it/s]  # with this patch
$ python profile.py open_metalcast_162.ogg
214it [05:03,  2.69s/it]  # on master (cancelled after five minutes)
$ python profile.py open_metalcast_162.ogg
5.379634857177734  # with this patch

One interesting observation: The code on master starts of a bit slower than the patched version on long files but becomes much slower as it processes more of the file. Maybe because the implementation seeks from the beginning of the file rather than doing a relative seek from the current position?

Copy link
Owner

@bastibe bastibe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this pull request. This is a very clever improvement to blocks, which not only improves performance, but more importantly, allows blocks to work even if the file is not seekable!

I have added a few comments that I would like to be addressed before merging. They are only small details, though. Overall, I like this pull request very much!

out = np.empty((3, 2))
blocks = list(sf.blocks(file_stereo_r, out=out))
assert blocks[0] is out
assert blocks[0].base is out
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used slightly inefficient slicing. This should be back to the original assertion now.

soundfile.py Outdated
self.read(n, dtype, always_2d, fill_value, out[offset:])
block = out[:min(blocksize, frames + overlap)] if fill_value is None else out
if copy_out:
import numpy as np
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this import here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Moved to the top of the function.

out = np.empty((3, 2))
blocks = list(sf.blocks(file_stereo_r, out=out))
assert blocks[0] is out
assert blocks[0].base is out
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be changed?

soundfile.py Outdated
self.seek(-overlap, SEEK_CUR)
frames += overlap
yield block
n = min(blocksize - offset, frames)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think n could be replaced by a more telling name. Maybe toread or something similar.

soundfile.py Outdated
self.seek(-overlap, SEEK_CUR)
frames += overlap
yield block
n = min(blocksize - offset, frames)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think n could be replaced by a more telling name. Maybe toread or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mgeier
Copy link
Contributor

mgeier commented Oct 4, 2017 via email

@tillahoffmann
Copy link
Contributor Author

tillahoffmann commented Oct 4, 2017

@mgeier, I don't see a performance degradation for zero overlap on my end (with an updated profiling script):

# pysound_profile.py
import argparse
import hashlib
import platform
import soundfile
import tqdm

ap = argparse.ArgumentParser("pysound_profile")
ap.add_argument('filenames', nargs='+')
ap.add_argument('--overlap', '-o', type=float, default=1)
ap.add_argument('--block-size', '-b', type=float, default=6)
ap.add_argument('--hash', '-d', default='sha256')
args = ap.parse_args()

print("pysound_profile")
print(platform.uname())
print()

for filename in args.filenames:
    print("Processing '%s' with block size %fs and overlap %fs..." %
          (filename, args.block_size, args.overlap))
    with soundfile.SoundFile(filename) as sf:
        hasher = hashlib.new(args.hash)
        blocks = sf.blocks(
            round(args.block_size * sf.samplerate),
            round(args.overlap * sf.samplerate)
        )
        for block in tqdm.tqdm(blocks):
            hasher.update(block)

    print("%s = %s" % (args.hash, hasher.digest().hex()))
(master) $ python pysound_profile.py -o 0 ACDC_-_Back_In_Black-sample.ogg open_metalcast_162.ogg
uname_result(system='Darwin', node='Tills-MacBook-Pro-2.local', release='17.0.0', version='Darwin Kernel Version 17.0.0: Thu Aug 24 21:48:19 PDT 2017; root:xnu-4570.1.46~2/RELEASE_X86_64', machine='x86_64', processor='i386')

Processing 'ACDC_-_Back_In_Black-sample.ogg' with block size 6.000000s and overlap 0.000000s...
5it [00:00, 27.42it/s]
sha256 = 4737d4ebc248822430df008d99cc1163f8800b41c237101163aebaa0fb370d15
Processing './open_metalcast_162.ogg' with block size 6.000000s and overlap 0.000000s...
323it [00:08, 38.26it/s]
sha256 = 7e8d690a6b580379c083b3a4817db74c348c84ee42b093ffe572cf72d4ac39f0

(blocks) $ python pysound_profile.py -o 0 ACDC_-_Back_In_Black-sample.ogg open_metalcast_162.ogg
uname_result(system='Darwin', node='Tills-MacBook-Pro-2.local', release='17.0.0', version='Darwin Kernel Version 17.0.0: Thu Aug 24 21:48:19 PDT 2017; root:xnu-4570.1.46~2/RELEASE_X86_64', machine='x86_64', processor='i386')

Processing 'ACDC_-_Back_In_Black-sample.ogg' with block size 6.000000s and overlap 0.000000s...
5it [00:00, 24.51it/s]
sha256 = 4737d4ebc248822430df008d99cc1163f8800b41c237101163aebaa0fb370d15
Processing './open_metalcast_162.ogg' with block size 6.000000s and overlap 0.000000s...
323it [00:08, 39.24it/s]
sha256 = 7e8d690a6b580379c083b3a4817db74c348c84ee42b093ffe572cf72d4ac39f0

I don't see a memory leak on my end rerunning the blocks on master with overlap.

@bastibe
Copy link
Owner

bastibe commented Oct 5, 2017

@mgeier A priori, I would imagine that seeking is approximately free for uncompressed files, but expensive for compressed files. Regardless, I would think that shuffling some numpy memory should always be a bit less expensive than calling out to a C library and updating file pointers.

Let's see how my intuition compares to reality ;-)

Current Master:

>>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 0): pass
773 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 256): pass
1.58 s ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 0): pass
9.51 s ± 83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 256): pass
36.1 s ± 422 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Your branch:

>>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 0): pass
622 ms ± 22.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 256): pass
1.21 s ± 4.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 0): pass
9.13 s ± 128 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 256): pass
18.3 s ± 47.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, there is a modest performance benefit for non-compressed files or zero overlap (5%-30%), but a substantial (2x) benefit for compressed files with nonzero overlap!

This performance benefit, and getting rid of the seek for nonseekable files, makes this very worthwhile in my eyes. Do you have any objections to merging this, @mgeier?


As a side note, I would not have thought that block processing takes so much time! Simply reading the whole file takes

>>> %timeit soundfile.read('still_alive.wav');
70.2 ms ± 258 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit soundfile.read('still_alive.flac');
235 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

while reading a single block takes

>>> %timeit soundfile.read('still_alive.wav', start=44100, frames=512);
160 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit soundfile.read('still_alive.flac', start=44100, frames=512);
811 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

which adds up to the 18-36 s of processing time if you multiply it with 30k blocks.

It might be worthwhile for blocks to optionally read larger chunks and split them into blocks internally. Is anyone interested in trying this?

@mgeier
Copy link
Contributor

mgeier commented Oct 6, 2017 via email

@bastibe
Copy link
Owner

bastibe commented Oct 7, 2017

@mgeier thank you for your thorough review. I pretty much agree on all counts.

IOW, you must not use the user-specified "out" to store the overlap. I guess you need to allocate separate memory in this case to store the overlap between iterations.

Good find! In my opinion, noting this in the documentation would be enough, as out is mostly a performance hack anyway. (Is this acceptable, @mgeier?)

@tillahoffmann If you add a note to this effect to the documentation, I'll gladly merge it (pending @mgeier's approval). If you want to address @mgeier's stylistic concerns, or implement his proposed fix instead of a note in the documentation, I'd be doubly grateful!

@tillahoffmann
Copy link
Contributor Author

tillahoffmann commented Oct 8, 2017

Thanks for the feedback.

There are too many comments for my taste. If the names of the used internal
functions are not clear enough, they should probably be renamed instead of
adding a comment before each use?
If it's not clear that "offset" affects the output, it should probably be
renamed to "output_offset" instead of adding a comment?

Renamed.

Probably "frames" should also be renamed to something less ambiguous?
Left as-is for consistency with the rest of the code base.

I don't like the use of the if/else expressions in this case.
Especially breaking a line for that is a big no-no.
Doesn't this look nicer with "normal" if/else statements?

Changed.

Also, (and more importantly than my above nitpicks,) there is still
something fishy going on in line 1213.
If "out" is given, indexing into it seems wrong (but that's just a gut
feeling).

Happy to change this behaviour. I kept the slicing to be consistent with the existing implementation (cf https://github.com/tillahoffmann/PySoundFile/blob/1d8f3812e218dedb5fc07d3a41bced5f333271ce/soundfile.py#L989).

You are assuming that the user is not writing to "out" (if provided by the
user, i.e. without copying) between iterations, but why shouln't she?
IOW, you must not use the user-specified "out" to store the overlap.

Yes, that's true. I've added a comment as suggested by @bastibe. We could also return an immutable view of out:

if blocksize > frames + overlap and fill_value is None:
    block = out[:frames + overlap]
else:
    block = out[:]
block.flags.writable = False
yield block

@mgeier
Copy link
Contributor

mgeier commented Oct 9, 2017

@tillahoffmann Thanks for the changes! I'd like to come back to those later, I think first we should discuss the main issue some more.

As it has happened before, we've reached a crossroads where a decision has to be made, but all the available options have some disadvantages.
I think it would be good to consider the options we are currently talking about, and also think about additional options that might come to mind.

I'm listing the options I can think of, probably there are even more?
Once we know all options, we can try to find an agreement on how to proceed.

Option 0, keep the status quo

This behaves correctly (AFAICT), but it is quite inefficient in the overlap case.
No overlap support for non-seekable files.

Option 1, the original proposal of this PR

Makes all investigated cases faster and adds overlap support for non-seekable files, but introduces a bug.
I'm now quite certain that it actually is a bug, and that a sentence in the documentation doesn't make that acceptable.
It is actually not as unlikely as I initially thought, that a user will want to modify the given out parameter.
A prime example would be applying a window in-place, but there is range of other signal processing steps that a user could want to apply as in-place operations. They might even use the array temporarily for something completely different.
Relying on the user not modifying out between iterations would be a violation of the principle of least surprise (regardless whether it is mentioned in the docs or not).

Option 2, modify the current PR to use a separate array for storing the overlap if out is given

This should get rid of the bug, but it might also reduce the performance win.
And it makes the implementation more complicated.
It is also a violation of the principle of least surprise because if I provide an out parameter, I don't expect the implementation to allocate additional memory.

Option 3, disable overlap when out is given

I don't know if anybody would ever want to use both arguments at the same time, but theoretically, this would be a breaking change.
But it would be quite a simple solution to get rid of the bug and still get all the performance benefits.

Option 4, fall back to using seek() when out is given

I guess this would also get rid of the bug, but it will make the implementation uglier.

Option 5, get rid of overlap altogether, without replacement

Doesn't require seek(), simpler implementation, we can choose the fastest implementation.
But the users have to do some of the work on their own.

When we were discussing the implementation of blocks() in #35, the overlap argument (and several variations thereof) was a large part of the discussion.
This leads me to believe that this functionality isn't at all obvious or uncontroversial.
What about getting rid of it and letting users implement their own overlapping schemes if needed?
@tillahoffmann What would it take in your case to implement the overlapping in your own code?
You could reserve (and zero-initialize) an array of size blocksize + overlap and then use a blocksize-sized slice at the end of it as out argument to blocks(). Once you are done with whatever processing you want to do, you can copy the last blocksize frames from the end of your array to the beginning and continue iteration.
That doesn't sound too hard, does it?
And the advantage is that somebody with a different use case can implement a different overlapping scheme that's more appropriate in their case.

I think we could argue with the same arguments as in #205, which might be a similar situation.

BTW, the same reasoning could be applied to @bastibe's question whether we should internally read larger blocks from the file and then split them into pieces before yielding them to the user. I think the users should choose what block size they want to use for reading from the file (that's what the soundfile library is for, isn't it?), but if they need smaller pieces later, they should to the splitting themselves (and they can optimize it much better for their specific use case!).
They could of course also implement their own file-like object that does some optimized buffering.
Either way should be possible to implement with the soundfile module, but IMHO the implementation itself is out-of-scope for the soundfile module.

@bastibe
Copy link
Owner

bastibe commented Oct 9, 2017

Three comments:

  1. We have existing users that already use overlap (including myself). Removing overlap is not an option.
  2. I believe that the main performance benefit comes from not seeking, and reading less. Any numpy operations are probably insignificant in comparison. I therefore believe that option 2 has no significant performance cost.
  3. As @tillahoffmann proposed, an additional option would be to return an immutable view.

I would like to go with option 2 from @mgeier's list: Store the overlap in a separate numpy array. Dear @tillahoffmann, would you like to implement this change? I know we are already asking you a lot, and I understand if you find this tiresome. If you'd prefer, we could merge your pull request now, and implement the proposed changes ourselves before releasing a new version.

@tillahoffmann
Copy link
Contributor Author

@bastibe, option 2 sounds sensible. Although I'll hold off until with the implementation until we've all agreed on the best option. @mgeier, let me know if you're happy with option 2.

@mgeier
Copy link
Contributor

mgeier commented Oct 9, 2017

Well, "happy" is the wrong word, but @bastibe has the final say anyway.

@bastibe
Copy link
Owner

bastibe commented Oct 10, 2017

What are you unhappy about, @mgeier?

@mgeier
Copy link
Contributor

mgeier commented Oct 10, 2017

All options have some drawbacks, but some options seem a bit "cleaner" to me than others.

As I said, the problem with option 2 is that additional memory is allocated, which is not expected when providing an out argument.

I think in the long run, the "cleanest" solution is option 5, but of course this would cause some breakage.
I would be interested in how you two (@bastibe and @tillahoffmann) are using the overlap option and how different it would look like if the overlap would be implemented in user code.
Is there some example code available?

@bastibe
Copy link
Owner

bastibe commented Oct 11, 2017

Thank you both for your opinions.

As I said, I use overlap often, for example for spectral analysis. In fact, we regularly use very big overlaps on the order of 80-95%. Of course I could implement my own overlap, but by the same logic, I could implement my own blocks. Removing overlap is therefore not an option I am willing to take.

Don't take this as criticism of your opinions, @mgeier, but merely that different applications require different solutions.

As the maintainer of SoundFile, I therefore say we go with option 2.

@tillahoffmann
Copy link
Contributor Author

I use overlap in a similar fashion for spectral analysis because reading the entire file into memory is not feasible but overlapping blocks are required to ensure I don't miss any information on the boundary.

I'll have a look at implementing option 2 over the next few days.

@mgeier
Copy link
Contributor

mgeier commented Oct 12, 2017

Of course I could implement my own overlap, but by the same logic, I could implement my own blocks.

No, that's faulty logic.
If that were logically sound, the function blocks() would also have to implement arbitrary windowing, arbitrary time-frequency transforms and arbitrary signal processing in general. Basically Turing-complete function arguments. That's clearly absurd.

So let's not extrapolate from one feature to another but just consider each feature separately.

Of course it's not strictly necessary to provide blocks(), but I still think it makes sense, because it avoids having to check in each iteration if the file is finished yet. And a generator of NumPy arrays is a really nice interface to external code.

The big difference with the overlap functionality is that this could be implemented generically (because of the nice "sequence of arrays" interface). It shouldn't be specific to the soundfile module.
It would be a really nice tool to have in a signal analysis module, along with tools for windowing and similar helpers.
And the good thing would be that this could also be used for other sequences of NumPy arrays, e.g. from live sound card input or an RTP network stream. Or in the trivial case just (a one-element sequence of) one big fat array.
If the "overlapper" were a separate tool, it would also be trivial to implement (external to the soundfile module) what you (@bastibe) mentioned above:

It might be worthwhile for blocks to optionally read larger chunks and split them into blocks internally.

For soundfile.blocks() you should use a blocksize that makes sense for reading from the file, since that's the realm of the soundfile module. The block size may be much larger than the size you would like to use for spectral analysis.
The resulting generator could be fed into a generic "overlapper", which would turn it into (potentially overlapping) blocks of a different length.
Nice, clean separation of concerns. Nice modularity and composability.

BTW, in its current form, isn't there a feature missing for your use case?
Shouldn't it be possible to stop the iteration at the last full block, discarding the last partial block?
Otherwise the partial block doesn't get the correct windowing and the results are wrong because of an unwanted rectangular window?

Having said all that ... while thinking about all this for a while, I found yet another option:

option 6, get rid of out, without replacement

This would allow us to keep the overlap feature while still being "clean".
The reason for implementing out in the first place was symmetry with read(). But I think we could live without it. Whoever needs to write into pre-allocated memory can use the read() method directly.

@bastibe
Copy link
Owner

bastibe commented Oct 13, 2017

Thank you for your thoughts, @mgeier. This was a very interesting discussion about how a different module might implement a generic block-iteration functionalty that could be superior to providing our own. I should be very interested in that module if you write it.

However, it is not pertinent to this pull request. As I said, removing existing functionality is not an option at this point. In this pull request, let's focus on @tillahoffmann's improvements on the existing blocks method. If you want to continue your discussion on alternative methods, please open a new issue.

Dear @tillahoffmann, is your latest commit ready to be merged?

@mgeier
Copy link
Contributor

mgeier commented Oct 13, 2017

@bastibe What about option 6?

@tillahoffmann
Copy link
Contributor Author

@bastibe, yes, I think the changes are ready to be merged.

@bastibe bastibe merged commit 1de6817 into bastibe:master Oct 13, 2017
@bastibe
Copy link
Owner

bastibe commented Oct 13, 2017

Thank you both for your contributions!

@bastibe
Copy link
Owner

bastibe commented Oct 13, 2017

@mgeier If you want to remove out, please open a separate issue. No removal of functionality in this pull request.

@tillahoffmann tillahoffmann deleted the blocks branch October 13, 2017 12:53
mgeier added a commit to mgeier/python-soundfile that referenced this pull request Oct 25, 2017
This is a continuation of PR bastibe#209.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants