-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement loadstreaming/savestreaming API #78
Conversation
Thanks! I'd love to have streaming support in FileIO! |
See #77 (comment). If this adds functionality on top of what's already there, great, but let's at least address what's missing. |
The difference is that these streams produce higher-level formatted data, not just raw file bytes. For instance: s = loadstreaming("myfile.ogg")
x = read(s, 1000) # returns 1000 frames of audio in a SampleBuf
close(s) In this case the underlying bytestream on disk is ogg-formatted, but when I read from this formatted stream I get samples of audio. If the ogg file is very large (or infinite), I can read or process it in chunks. It also provides a unified API between opening streams to hardware capture/playback and working with files, or infinite network streams. You can see some other examples of this API in LibSndFile.jl, though in that library it's currently called |
I see. That's not bad, in a multi-image file we could also load a single frame at a time. 👍 Regarding names, to me it seems that the distinguishing feature here (to avoid the kind of confusion that nabbed me) is the "chunk." |
I think I think from a user's perspective it's a reasonable convention that For my purposes I generally consider streaming to be the more fundamental operation, in the sense that x = load("mysong.wav") is equivalent to str = loadstreaming("mysong.wav")
x = read(str)
close(str) or even better: x = loadstreaming("mysong.wav") do str
read(str)
end We could in fact define Though I'm sure there are data types where you have to read the whole thing before you can start doing anything useful, so in those cases maybe the relationship would be reversed. To better demo the benefit, this is a demo that works on the current LibSndFile, except the loadstreaming("srcfile.wav") do src
savestreaming("destfile.wav", nchannels(src), samplerate(src), Float32) do dest
while write(dest, float(read(src, 2048)) / 2) == 2048 end
end
end And you get an output with half the amplitude of your input, while only keeping 2048 frames in memory at a time (of course in real code we'd want to pre-allocate the buffer to reduce allocation). |
36126ed
to
d922a07
Compare
Just rebased and added some tests, should be ready for review. There's some refactoring that could be done to remove some duplication, but I figured I'd leave it as-is for now to make the diff easier to read. I can either do the refactoring as part of this PR or a follow-up. |
used a little metaprogramming to reduce the duplication introduced in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Love it!
README.md
Outdated
end | ||
|
||
function load(q::Formatted{format"WAV"}, args...; kwargs...) | ||
savestreaming(q, args...; kwargs...) do stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadstreaming
README.md
Outdated
ownstream::Bool | ||
end | ||
|
||
function read(reader::WAVReader, frames::Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a private read
or Base.read
? (FileIO's docs were written, I think, before the the whole "do you extend or not?" debate was settled, but we should certainly fix this.)
README.md
Outdated
# read and decode audio samples from reader.io | ||
end | ||
|
||
function close(reader::WAVReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, clarify whether this is Base.close
or a private close
. (It's fine to explain all in just one place.)
Also, I notice your test implements eof
, worth mentioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, should be Base.[read|write|close]
. I'll fix. re: eof
, I'd love it if there were a well-defined and documented stream interface we could link to here. I can try at least adding a list of functions to implement here and maybe eventually migrate it to the julia docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went down a bit of a rabbit hole trying to figure out a more formalized stream interface. My notes are here, but basically it's not super well-defined at this point. Once that's nailed down we can link to it in these docs.
README.md
Outdated
# do whatever cleanup the reader needs | ||
reader.ownstream && close(reader.io) | ||
end | ||
loadstreaming(f::File{format"WAV"}) = WAVReader(open(f), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add blank line before this one? Or does the "fallback" comment below also apply to close
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ordered the comment to clarify this
src/loadsave.jl
Outdated
type that can be read from in chunks, rather than loading the whole contents all | ||
at once | ||
- `loadstreaming(strm)` loads the stream from an `IOStream` or similar object. In this case, | ||
the magic bytes are essential. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what "are essential" means. Perhaps say "the format is inferred from magic bytes in strm
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, copied that from the load
docstring. I think the idea is that because there's no file extension the magic bytes are the only way to get the file type. I can clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified in new commit
src/loadsave.jl
Outdated
- `loadstreaming(strm)` loads the stream from an `IOStream` or similar object. In this case, | ||
the magic bytes are essential. | ||
- `load(File(format"PNG",filename))` specifies the format directly, and bypasses inference. | ||
- `load(f; options...)` passes keyword arguments on to the loader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load
->loadstreaming
? Also, is that really File
or did you mean Stream
(as below with savestreaming
)?
src/loadsave.jl
Outdated
|
||
- `savestreaming(filename, data...)` saves the contents of a formatted file, | ||
trying to infer the format from `filename`. | ||
- `savestreaming(Stream(format"PNG",io), data...)` specifies the format directly, and bypasses inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this produces an output Stream
, not a file, right? Is savestreaming(File(...))
also supported? (bypassing inference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadstreaming
and savestreaming
(and load
and save
I believe) should support File
s and Stream
s. For example, if I call loadstreaming
on a stream coming in from a network, I should be able to read
formatted data (e.g. audio frames or video frames) from the returned object, which will in turn read from the stream. calling loadstreaming
on a File
just opens that file and then handles it as a stream.
similarly savestreaming
should handle both File
s and Streams
.
The new commit I just pushed adds more methods to the docstring. I also added them for load
and save
, which I believe should also support both.
45815e9
to
239af17
Compare
I just rebased and made edits based on your comments. Apologies this sat dormant for a while, I know that makes it tougher to review. Compared to what you looked at last time this is just documentation updates. Tests seem to be passing on 0.6 but failing on nightlies due to some Nullable issue - probably unrelated to this PR? |
bump. 😃 |
Well, if @timholy loves it I don't see why we shouldn't merge this ;) If no one is against it, I merge it tonight! |
I figured it would be easier to discuss #77 if we had some code to look at. This PR is not close to mergeable, it's mostly to play with how this might work and get some feedback from you guys on whether you think this is a promising direction.
Here's the context (copied from the issue):
IIUC load and save are generally one-off operations, i.e. load("somefile.wav") loads the full wave file into memory.
Sometimes though, you have huge files that you might not want to load into memory all at once, or streams that have no end (like an internet radio stream). In my LibSndFile package I also export loadstream and savestream functions that instead return special stream types that you can read/write chunks of audio from.
@jongwook's MP3 package also has the same functionality, but it's not clear who should own those functions. It seems like it would make sense to have them in FileIO as they use all the same query tooling. In fact, we implement load and save as wrappers around the streaming versions.
Do these functions seem like they could fit into the FileIO architecture? What are the video packages currently doing for this sort of thing?