-
Notifications
You must be signed in to change notification settings - Fork 120
Update SoundFile.blocks. #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bastibe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this pull request. This is a very clever improvement to blocks, which not only improves performance, but more importantly, allows blocks to work even if the file is not seekable!
I have added a few comments that I would like to be addressed before merging. They are only small details, though. Overall, I like this pull request very much!
tests/test_pysoundfile.py
Outdated
| out = np.empty((3, 2)) | ||
| blocks = list(sf.blocks(file_stereo_r, out=out)) | ||
| assert blocks[0] is out | ||
| assert blocks[0].base is out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used slightly inefficient slicing. This should be back to the original assertion now.
soundfile.py
Outdated
| self.read(n, dtype, always_2d, fill_value, out[offset:]) | ||
| block = out[:min(blocksize, frames + overlap)] if fill_value is None else out | ||
| if copy_out: | ||
| import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this import here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Moved to the top of the function.
tests/test_pysoundfile.py
Outdated
| out = np.empty((3, 2)) | ||
| blocks = list(sf.blocks(file_stereo_r, out=out)) | ||
| assert blocks[0] is out | ||
| assert blocks[0].base is out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be changed?
soundfile.py
Outdated
| self.seek(-overlap, SEEK_CUR) | ||
| frames += overlap | ||
| yield block | ||
| n = min(blocksize - offset, frames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think n could be replaced by a more telling name. Maybe toread or something similar.
soundfile.py
Outdated
| self.seek(-overlap, SEEK_CUR) | ||
| frames += overlap | ||
| yield block | ||
| n = min(blocksize - offset, frames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think n could be replaced by a more telling name. Maybe toread or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
I'm currently on the road and can't test it myself but doesn't that
pessimize the IMHO more important case where overlap is zero (and out is
None)?
And the current call to seek() doesn't seek from the beginning but
uses SEEK_CUR. I don't know what's going wrong when progressing further
into a large file. Probably a memory leak of some sort?
Did you notice increased memory usage?
I guess this phenomenon would be worth investigating further ...
Which OS did you test on?
@bastibe Did you get the same performance results?
|
|
@mgeier, I don't see a performance degradation for zero overlap on my end (with an updated profiling script): # pysound_profile.py
import argparse
import hashlib
import platform
import soundfile
import tqdm
ap = argparse.ArgumentParser("pysound_profile")
ap.add_argument('filenames', nargs='+')
ap.add_argument('--overlap', '-o', type=float, default=1)
ap.add_argument('--block-size', '-b', type=float, default=6)
ap.add_argument('--hash', '-d', default='sha256')
args = ap.parse_args()
print("pysound_profile")
print(platform.uname())
print()
for filename in args.filenames:
print("Processing '%s' with block size %fs and overlap %fs..." %
(filename, args.block_size, args.overlap))
with soundfile.SoundFile(filename) as sf:
hasher = hashlib.new(args.hash)
blocks = sf.blocks(
round(args.block_size * sf.samplerate),
round(args.overlap * sf.samplerate)
)
for block in tqdm.tqdm(blocks):
hasher.update(block)
print("%s = %s" % (args.hash, hasher.digest().hex()))I don't see a memory leak on my end rerunning the blocks on master with overlap. |
|
@mgeier A priori, I would imagine that seeking is approximately free for uncompressed files, but expensive for compressed files. Regardless, I would think that shuffling some numpy memory should always be a bit less expensive than calling out to a C library and updating file pointers. Let's see how my intuition compares to reality ;-) Current Master: >>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 0): pass
773 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 256): pass
1.58 s ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 0): pass
9.51 s ± 83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 256): pass
36.1 s ± 422 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)Your branch: >>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 0): pass
622 ms ± 22.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit for block in soundfile.blocks('still_alive.wav', 512, 256): pass
1.21 s ± 4.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 0): pass
9.13 s ± 128 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit for block in soundfile.blocks('still_alive.flac', 512, 256): pass
18.3 s ± 47.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)As you can see, there is a modest performance benefit for non-compressed files or zero overlap (5%-30%), but a substantial (2x) benefit for compressed files with nonzero overlap! This performance benefit, and getting rid of the As a side note, I would not have thought that block processing takes so much time! Simply reading the whole file takes >>> %timeit soundfile.read('still_alive.wav');
70.2 ms ± 258 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit soundfile.read('still_alive.flac');
235 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)while reading a single block takes >>> %timeit soundfile.read('still_alive.wav', start=44100, frames=512);
160 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit soundfile.read('still_alive.flac', start=44100, frames=512);
811 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)which adds up to the 18-36 s of processing time if you multiply it with 30k blocks. It might be worthwhile for |
|
Thanks to both of you for the thorough benchmarks, that's really
interesting.
I never doubted the overlapping case, that's an obvious improvement (UPDATE:
see below for an objection).
I had the feeling that there was too much copying going on in the
non-overlapping case, but I was wrong.
Apart from the benchmarks that clearly show that there is no problem (in
fact it's an improvement) I think I also understand the code better now
after some further study (UPDATE: but there is still something I just
realized later, see below).
The main difference is that before, some empty memory was allocated for
each block and then written to and then this same memory was yielded to the
user.
With the proposed changes, empty memory is only allocated once, written to
for each block and then a copy of the memory is made which is yielded to
the user.
I don't know why, but latter is apparently faster.
So no, I don't have any objections in general. (UPDATE: see below for an
objection.)
There are some potential nits to pick, though:
There are too many comments for my taste. If the names of the used internal
functions are not clear enough, they should probably be renamed instead of
adding a comment before each use?
If it's not clear that "offset" affects the output, it should probably be
renamed to "output_offset" instead of adding a comment?
Probably "frames" should also be renamed to something less ambiguous?
I don't like the use of the if/else expressions in this case.
Especially breaking a line for that is a big no-no.
Doesn't this look nicer with "normal" if/else statements?
Also, (and more importantly than my above nitpicks,) there is still
something fishy going on in line 1213.
If "out" is given, indexing into it seems wrong (but that's just a gut
feeling).
And finally, and I'm just realizing this now while writing this response,
it might might not be as nice after all.
If I'm not mistaken (and it's very likely that I am), there's a bug:
You are assuming that the user is not writing to "out" (if provided by the
user, i.e. without copying) between iterations, but why shouln't she?
IOW, you must not use the user-specified "out" to store the overlap.
I guess you need to allocate separate memory in this case to store the
overlap between iterations.
|
|
@mgeier thank you for your thorough review. I pretty much agree on all counts.
Good find! In my opinion, noting this in the documentation would be enough, as @tillahoffmann If you add a note to this effect to the documentation, I'll gladly merge it (pending @mgeier's approval). If you want to address @mgeier's stylistic concerns, or implement his proposed fix instead of a note in the documentation, I'd be doubly grateful! |
|
Thanks for the feedback.
Renamed.
Changed.
Happy to change this behaviour. I kept the slicing to be consistent with the existing implementation (cf https://github.com/tillahoffmann/PySoundFile/blob/1d8f3812e218dedb5fc07d3a41bced5f333271ce/soundfile.py#L989).
Yes, that's true. I've added a comment as suggested by @bastibe. We could also return an immutable view of if blocksize > frames + overlap and fill_value is None:
block = out[:frames + overlap]
else:
block = out[:]
block.flags.writable = False
yield block |
|
@tillahoffmann Thanks for the changes! I'd like to come back to those later, I think first we should discuss the main issue some more. As it has happened before, we've reached a crossroads where a decision has to be made, but all the available options have some disadvantages. I'm listing the options I can think of, probably there are even more? Option 0, keep the status quo This behaves correctly (AFAICT), but it is quite inefficient in the overlap case. Option 1, the original proposal of this PR Makes all investigated cases faster and adds Option 2, modify the current PR to use a separate array for storing the overlap if This should get rid of the bug, but it might also reduce the performance win. Option 3, disable I don't know if anybody would ever want to use both arguments at the same time, but theoretically, this would be a breaking change. Option 4, fall back to using I guess this would also get rid of the bug, but it will make the implementation uglier. Option 5, get rid of Doesn't require When we were discussing the implementation of I think we could argue with the same arguments as in #205, which might be a similar situation. BTW, the same reasoning could be applied to @bastibe's question whether we should internally read larger blocks from the file and then split them into pieces before yielding them to the user. I think the users should choose what block size they want to use for reading from the file (that's what the |
|
Three comments:
I would like to go with option 2 from @mgeier's list: Store the overlap in a separate numpy array. Dear @tillahoffmann, would you like to implement this change? I know we are already asking you a lot, and I understand if you find this tiresome. If you'd prefer, we could merge your pull request now, and implement the proposed changes ourselves before releasing a new version. |
|
Well, "happy" is the wrong word, but @bastibe has the final say anyway. |
|
What are you unhappy about, @mgeier? |
|
All options have some drawbacks, but some options seem a bit "cleaner" to me than others. As I said, the problem with option 2 is that additional memory is allocated, which is not expected when providing an I think in the long run, the "cleanest" solution is option 5, but of course this would cause some breakage. |
|
Thank you both for your opinions. As I said, I use Don't take this as criticism of your opinions, @mgeier, but merely that different applications require different solutions. As the maintainer of SoundFile, I therefore say we go with option 2. |
|
I use I'll have a look at implementing option 2 over the next few days. |
No, that's faulty logic. So let's not extrapolate from one feature to another but just consider each feature separately. Of course it's not strictly necessary to provide The big difference with the overlap functionality is that this could be implemented generically (because of the nice "sequence of arrays" interface). It shouldn't be specific to the
For BTW, in its current form, isn't there a feature missing for your use case? Having said all that ... while thinking about all this for a while, I found yet another option: option 6, get rid of This would allow us to keep the overlap feature while still being "clean". |
|
Thank you for your thoughts, @mgeier. This was a very interesting discussion about how a different module might implement a generic block-iteration functionalty that could be superior to providing our own. I should be very interested in that module if you write it. However, it is not pertinent to this pull request. As I said, removing existing functionality is not an option at this point. In this pull request, let's focus on @tillahoffmann's improvements on the existing Dear @tillahoffmann, is your latest commit ready to be merged? |
|
@bastibe What about option 6? |
|
@bastibe, yes, I think the changes are ready to be merged. |
|
Thank you both for your contributions! |
|
@mgeier If you want to remove |
This is a continuation of PR bastibe#209.
This PR removes the necessity to seek in the input file when generating blocks because seeking is relatively expensive. Here is a benchmark on an ACDC snippet (25 seconds) and a metal podcast (32 mins, 15 seconds).
One interesting observation: The code on
masterstarts of a bit slower than the patched version on long files but becomes much slower as it processes more of the file. Maybe because the implementation seeks from the beginning of the file rather than doing a relative seek from the current position?