Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement loadstreaming/savestreaming API #78

Merged
merged 11 commits into from
Apr 5, 2018
89 changes: 87 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,44 @@ s = query(io) # io is a stream
will return a `File` or `Stream` object that also encodes the detected
file format.

Sometimes you want to read or write files that are larger than your available
memory, or might be an unknown or infinite length (e.g. reading an audio or
video stream from a socket). In these cases it might not make sense to process
the whole file at once, but instead process it a chunk at a time. For these
situations FileIO provides the `loadstreaming` and `savestreaming` functions,
which return an object that you can `read` or `write`, rather than the file data
itself.

This would look something like:

```jl
using FileIO
audio = loadstreaming("bigfile.wav")
try
while !eof(audio)
chunk = read(audio, 4096) # read 4096 frames
# process the chunk
end
finally
close(audio)
end
```

or use `do` syntax to auto-close the stream:

```jl
using FileIO
loadstreaming("bigfile.wav") do audio
while !eof(audio)
chunk = read(audio, 4096) # read 4096 frames
# process the chunk
end
end
```

Note that in these cases you may want to use `read!` with a pre-allocated buffer
for maximum efficiency.

## Adding new formats

You register a new format by adding `add_format(fmt, magic,
Expand Down Expand Up @@ -130,15 +168,62 @@ end
Note that these are `load` and `save`, **not** `FileIO.load` and `FileIO.save`.
Because a given format might have multiple packages that are capable of reading it,
FileIO will dispatch to these using module-scoping, e.g., `SomePkg.load(args...)`.
Consequently, **packages should define "private" `load` and `save` methods, and
not extend (import) FileIO's**.
Consequently, **packages should define "private" `load` and `save` methods (also
`loadstreaming` and `savestreaming` if you implement them), and not extend
(import) FileIO's**.

`load(::File)` and `save(::File)` should close any streams
they open. (If you use the `do` syntax, this happens for you
automatically even if the code inside the `do` scope throws an error.)
Conversely, `load(::Stream)` and `save(::Stream)` should not close the
input stream.

`loadstreaming` and `savestreaming` use the same query mechanism, but return a
decoded stream that users can `read` or `write`. You should also implement a
`close` method on your reader or writer type. Just like with `load` and `save`,
if the user provided a filename, your `close` method should be responsible for
closing any streams you opened in order to read or write the file. If you are
given a `Stream`, your `close` method should only do the clean up for your
reader or writer type, not close the stream.

```jl
struct WAVReader
io::IO
ownstream::Bool
end

function Base.read(reader::WAVReader, frames::Int)
# read and decode audio samples from reader.io
end

function Base.close(reader::WAVReader)
# do whatever cleanup the reader needs
reader.ownstream && close(reader.io)
end

# FileIO has fallback functions that make these work using `do` syntax as well,
# and will automatically call `close` on the returned object.
loadstreaming(f::File{format"WAV"}) = WAVReader(open(f), true)
loadstreaming(s::Stream{format"WAV"}) = WAVReader(s, false)
```

If you choose to implement `loadstreaming` and `savestreaming` in your package,
you can easily add `save` and `load` methods in the form of:

```jl
function save(q::Formatted{format"WAV"}, data, args...; kwargs...)
savestreaming(q, args...; kwargs...) do stream
write(stream, data)
end
end

function load(q::Formatted{format"WAV"}, args...; kwargs...)
loadstreaming(q, args...; kwargs...) do stream
read(stream)
end
end
```

## Help

You can get an API overview by typing `?FileIO` at the REPL prompt.
Expand Down
4 changes: 4 additions & 0 deletions src/FileIO.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@ export DataFormat,
file_extension,
info,
load,
loadstreaming,
magic,
query,
save,
savestreaming,
skipmagic,
stream,
unknown
Expand All @@ -40,7 +42,9 @@ include("registry.jl")

- `load([filename|stream])`: read data in formatted file, inferring the format
- `load(File(format"PNG",filename))`: specify the format manually
- `loadstreaming([filename|stream])`: similar to `load`, except that it returns an object that can be read from
- `save(filename, data...)` for similar operations involving saving data
- `savestreaming([filename|stream])`: similar to `save`, except that it returns an object that can be written to

- `io = open(f::File, args...)` opens a file
- `io = stream(s::Stream)` returns the IOStream from the query object `s`
Expand Down
151 changes: 114 additions & 37 deletions src/loadsave.jl
Original file line number Diff line number Diff line change
Expand Up @@ -40,86 +40,163 @@ add_loader
"`add_saver(fmt, :Package)` triggers `using Package` before saving format `fmt`"
add_saver


"""
- `load(filename)` loads the contents of a formatted file, trying to infer
the format from `filename` and/or magic bytes in the file.
- `load(strm)` loads from an `IOStream` or similar object. In this case,
the magic bytes are essential.
- `load(File(format"PNG",filename))` specifies the format directly, and bypasses inference.
there is no filename extension, so we rely on the magic bytes for format
identification.
- `load(File(format"PNG", filename))` specifies the format directly, and bypasses inference.
- `load(Stream(format"PNG", io))` specifies the format directly, and bypasses inference.
- `load(f; options...)` passes keyword arguments on to the loader.
"""
load(s::Union{AbstractString,IO}, args...; options...) =
load(query(s), args...; options...)
load

"""
Some packages may implement a streaming API, where the contents of the file can
be read in chunks and processed, rather than all at once. Reading from these
higher-level streams should return a formatted object, like an image or chunk of
video or audio.

- `loadstreaming(filename)` loads the contents of a formatted file, trying to infer
the format from `filename` and/or magic bytes in the file. It returns a streaming
type that can be read from in chunks, rather than loading the whole contents all
at once
- `loadstreaming(strm)` loads the stream from an `IOStream` or similar object.
In this case, there is no filename extension, so we rely on the magic bytes
for format identification.
- `loadstreaming(File(format"WAV",filename))` specifies the format directly, and
bypasses inference.
- `loadstreaming(Stream(format"WAV", io))` specifies the format directly, and
bypasses inference.
- `loadstreaming(f; options...)` passes keyword arguments on to the loader.
"""
loadstreaming

"""
- `save(filename, data...)` saves the contents of a formatted file,
trying to infer the format from `filename`.
- `save(Stream(format"PNG",io), data...)` specifies the format directly, and bypasses inference.
- `save(File(format"PNG",filename), data...)` specifies the format directly, and bypasses inference.
- `save(f, data...; options...)` passes keyword arguments on to the saver.
"""
save(s::Union{AbstractString,IO}, data...; options...) =
save(query(s), data...; options...)
save

"""
Some packages may implement a streaming API, where the contents of the file can
be written in chunks, rather than all at once. These higher-level streams should
accept formatted objects, like an image or chunk of video or audio.

- `savestreaming(filename, data...)` saves the contents of a formatted file,
trying to infer the format from `filename`.
- `savestreaming(File(format"WAV",filename))` specifies the format directly, and
bypasses inference.
- `savestreaming(Stream(format"WAV", io))` specifies the format directly, and
bypasses inference.
- `savestreaming(f, data...; options...)` passes keyword arguments on to the saver.
"""
savestreaming

# if a bare filename or IO stream are given, query for the format and dispatch
# to the formatted handlers below
for fn in (:load, :loadstreaming, :save, :savestreaming)
@eval $fn(s::Union{AbstractString,IO}, args...; options...) =
$fn(query(s), args...; options...)
end

# return a save function, so you can do `thing_to_save |> save("filename.ext")`
function save(s::Union{AbstractString,IO}; options...)
data -> save(s, data; options...)
end

# Forced format
# Allow format to be overridden with first argument
function save{sym}(df::Type{DataFormat{sym}}, f::AbstractString, data...; options...)
libraries = applicable_savers(df)
checked_import(libraries[1])
eval(Main, :($save($File($(DataFormat{sym}), $f),
$data...; $options...)))
end

function savestreaming{sym}(df::Type{DataFormat{sym}}, s::IO, data...; options...)
libraries = applicable_savers(df)
checked_import(libraries[1])
eval(Main, :($savestreaming($Stream($(DataFormat{sym}), $s),
$data...; $options...)))
end

function save{sym}(df::Type{DataFormat{sym}}, s::IO, data...; options...)
libraries = applicable_savers(df)
checked_import(libraries[1])
eval(Main, :($save($Stream($(DataFormat{sym}), $s),
$data...; $options...)))
end

function savestreaming{sym}(df::Type{DataFormat{sym}}, f::AbstractString, data...; options...)
libraries = applicable_savers(df)
checked_import(libraries[1])
eval(Main, :($savestreaming($File($(DataFormat{sym}), $f),
$data...; $options...)))
end

# Fallbacks
function load{F}(q::Formatted{F}, args...; options...)
if unknown(q)
isfile(filename(q)) || open(filename(q)) # force systemerror
throw(UnknownFormat(q))
end
libraries = applicable_loaders(q)
failures = Any[]
for library in libraries
# do-syntax for streaming IO
for fn in (:loadstreaming, :savestreaming)
@eval function $fn(f::Function, args...; kwargs...)
str = $fn(args...; kwargs...)
try
Library = checked_import(library)
if !has_method_from(methods(Library.load), Library)
throw(LoaderError(string(library), "load not defined"))
f(str)
finally
close(str)
end
end
end

# Handlers for formatted files/streams

for fn in (:load, :loadstreaming)
@eval function $fn{F}(q::Formatted{F}, args...; options...)
if unknown(q)
isfile(filename(q)) || open(filename(q)) # force systemerror
throw(UnknownFormat(q))
end
libraries = applicable_loaders(q)
failures = Any[]
for library in libraries
try
Library = checked_import(library)
if !has_method_from(methods(Library.$fn), Library)
throw(LoaderError(string(library), "$($fn) not defined"))
end
return eval(Main, :($(Library.$fn)($q, $args...; $options...)))
catch e
push!(failures, (e, q))
end
return eval(Main, :($(Library.load)($q, $args...; $options...)))
catch e
push!(failures, (e, q))
end
handle_exceptions(failures, "loading \"$(filename(q))\"")
end
handle_exceptions(failures, "loading \"$(filename(q))\"")
end
function save{F}(q::Formatted{F}, data...; options...)
unknown(q) && throw(UnknownFormat(q))
libraries = applicable_savers(q)
failures = Any[]
for library in libraries
try
Library = checked_import(library)
if !has_method_from(methods(Library.save), Library)
throw(WriterError(string(library), "save not defined"))

for fn in (:save, :savestreaming)
@eval function $fn{F}(q::Formatted{F}, data...; options...)
unknown(q) && throw(UnknownFormat(q))
libraries = applicable_savers(q)
failures = Any[]
for library in libraries
try
Library = checked_import(library)
if !has_method_from(methods(Library.$fn), Library)
throw(WriterError(string(library), "$($fn) not defined"))
end
return eval(Main, :($(Library.$fn)($q, $data...; $options...)))
catch e
push!(failures, (e, q))
end
return eval(Main, :($(Library.save)($q, $data...; $options...)))
catch e
push!(failures, (e, q))
end
handle_exceptions(failures, "saving \"$(filename(q))\"")
end
handle_exceptions(failures, "saving \"$(filename(q))\"")
end

# returns true if the given method table includes a method defined by the given
# module, false otherwise
function has_method_from(mt, Library)
for m in mt
if getmodule(m) == Library
Expand Down
Loading