convenience `tmap` function #200

oxinabox · 2020-02-03T13:14:22Z

I find myself writin a lot of code like:

    generator = Base.Generator(zip(models, input_splits) do (model, inputs)
        predict(model, inputs; kwargs...)
    end
    tcollect(generator; basesize = 1)

Which I coild also write in direct generator from but I find that a bit too long:

tcollect((predict(model, inputs; kwargs...) for (model, inputs) in zip(models, input_splits); basesize = 1)

What I would like to write is:

tmap(zip(models, input_splits); basesize = 1) do (model, inputs)
        predict(model, inputs; kwargs...)
end

oxinabox · 2020-02-03T20:35:29Z

Code I ended up writing for this was:

"""
    tmap(f, xs...; basesize=1)

Multithreaded version of `map`(@ref).
`basesize` controls how the minimum number of items `xs` to process per thread.
If the function is slow (>1ms), then `basesize=1` is good.
If the function is faster then `basesize` should be set high enough that proccessing that
many items takes some time (~1ms). This is to counter the overhead of how long it takes
to dispatch work to a thread (~50μs).
"""
function tmap(f, xs...; basesize=1)
    # We collect the zip to make sure can split it up. Its not always required
    # TODO: workout how to avoid collect when not required.
    inputs = collect(zip(xs...))
    return tcollect(MapSplat(f), inputs; basesize = basesize)
end

tkf · 2020-02-03T23:07:14Z

Supporting zip-of-arrays should be easy. I guess tmap could be a nice API but I'm not 100% sure.

Regarding the first snippet, you can just write

generator = Base.Generator(models, input_splits) do model, inputs
    predict(model, inputs; kwargs...)
end

without zip. (Also, you'll be able to use Iterators.map after JuliaLang/julia#34352)

If we add a curried version of tcollect it would look like

predictions = Iterators.map(models, input_splits) do model, inputs
    predict(model, inputs; kwargs...)
end |> tcollect(basesize = 1)

which takes the same number of lines as

predictions = tmap(models, input_splits; basesize = 1) do model, inputs
    predict(model, inputs; kwargs...)
end

Arguably, the curried version of tcollect composes well with other components; e.g., you can insert |> withprogress to get the progress bar

predictions = Iterators.map(models, input_splits) do model, inputs
    predict(model, inputs; kwargs...)
end |> withprogress |> tcollect(basesize = 1)

or to produce a DataFrame

Iterators.map(...) do ...
    ...
end |> withprogress |> tcopy(DataFrame, basesize = 1)

Note: |> in above examples are the usual x |> f = f(x).

oxinabox · 2020-02-03T23:33:52Z

Your curried tcollect has no interest to me as
I will not use regular |> in my codebase.
It puts things in the wrong order when the rest of the codebase always has return value controlling function on the left.

Your examples there are nice and I once would have really liked them. But I no longer write any code that work in that way.

gen = Iterators.map(models, input_splits) do model, inputs
    predict(model, inputs; kwargs...)
end
tcollect(withprogress(gen); basesize = 1)

is almost as nice and return type control remains always on the left.

But anyway, this is not a thread for discussing |>

But for discussing something that exists only to match existing API from Base.

I guess main argument against it is it would allow people to use Transducers.jl very shallowly, without delving into what a Transducer is or how to compose them.

This is also the main argument in favor of it.

tkf · 2020-02-04T00:26:49Z

As I said, I half-agree that tmap would be a nice addition. I'm just brainstorming if there can be a nice API.

I guess main argument against it is it would allow people to use Transducers.jl very shallowly, without delving into what a Transducer is or how to compose them.

This is also the main argument in favor of it.

Yes, this is a good summary of my struggle.

tkf · 2020-02-05T04:45:56Z

zip is supported by #203.

tkf · 2020-02-06T05:47:34Z

I thought about tmap but I think it's best to implement it outside Transducers.jl. This is because:

tcollect and alike do not propagate size information, at least at the moment. This is for using the same implementation for transducers that may break the shape (e.g., filtering). But, arguably, it is better for tmap(f, ::Matrix) to return a Matrix.
tmap(f, xs::AbstractArray) can use extra assumptions that are not available in general reduce:
- For example, the shape of output is known. So, it can implement (slightly?) more efficient mechanism than tcollect by using the approach like Base.collect_to! (setindex!-based) instead Base.grow_to! (push!-based).
- You can use return_type as an optimization and then minimize the allocation when the output type of f is inferred to be a concrete type.
- You can schedule the tasks more dynamically in tmap since each work is "independent." This would let you squash O(log N) "clean up" work into O(1). Large reduced result size #136 (comment)

(OTOH, I think implementing tmap based on Transducers.jl for the moment is a good choice because all the tmap implementations out there rely on return_type AFAIK.)

Maybe it makes sense to create a package (say) ThreadExtras with tmap that just wraps tcollect for the moment. It can do post-hoc reshape to satisfy the first point. It can gradually drop Transducers.jl once it has its own implementation. Threads.foreach(f, ::Channel) from JuliaLang/julia#34543 can be a good addition, too. Although I don't need this personally so not really motivated to do this... Maybe I'll create ThreadExtras when I need more efficient tmap at some point. But I'm closing this issue for now.

tkf · 2020-02-06T07:15:48Z

Just FYI, other libraries with "tmap":

tkf closed this as completed Feb 6, 2020

tkf added the no changelog label Feb 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convenience `tmap` function #200

convenience `tmap` function #200

oxinabox commented Feb 3, 2020 •

edited

Loading

oxinabox commented Feb 3, 2020 •

edited

Loading

tkf commented Feb 3, 2020

oxinabox commented Feb 3, 2020

tkf commented Feb 4, 2020

tkf commented Feb 5, 2020

tkf commented Feb 6, 2020

tkf commented Feb 6, 2020

convenience tmap function #200

convenience tmap function #200

Comments

oxinabox commented Feb 3, 2020 • edited Loading

oxinabox commented Feb 3, 2020 • edited Loading

tkf commented Feb 3, 2020

oxinabox commented Feb 3, 2020

tkf commented Feb 4, 2020

tkf commented Feb 5, 2020

tkf commented Feb 6, 2020

tkf commented Feb 6, 2020

convenience `tmap` function #200

convenience `tmap` function #200

oxinabox commented Feb 3, 2020 •

edited

Loading

oxinabox commented Feb 3, 2020 •

edited

Loading