Add `EmbeddingBag` #2031

mcognetta · 2022-08-02T11:38:58Z

Add EmbeddingBag, a slight generalization of Embedding which allows for embedding multiple items at once and performing a reduction on them. See: PyTorch's implementation.

This PR implements PyTorch's input/offset embedding as well as scalar, vector, vector of vector, matrix, and OneHotVector/OneHotMatrix input types.

EmbeddingBag is an outstanding feature in #1431.

PR Checklist

Tests are added
Entry in NEWS.md
Documentation, if applicable

mcognetta · 2022-08-02T14:18:16Z

There are two things to improve:

add support for padding_idx (and in Embedding)
reduce allocations (some of the methods don't operate entirely inplace on a workspace matrix)

src/layers/basic.jl

test/layers/basic.jl

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

Statistics is imported by Flux so we can just call `mean` rather than `Statistics.mean`.

src/layers/basic.jl

DhairyaLGandhi · 2022-08-03T10:48:07Z

Avoiding the mutation would be ideal.

…

On Wed, Aug 3, 2022, 11:38 Marco ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/layers/basic.jl <#2031 (comment)>: > + offsets[1] == 0 || throw(ArgumentError("`offsets` must begin with 0.")) + out = zeros(eltype(m.weight), size(m.weight, 1), length(offsets)) + start = firstindex(inputs) + for i in eachindex(offsets[1:end-1]) + out[:, i] = m(inputs[start:offsets[i+1]]) + start = offsets[i+1]+1 + end + out[:, end] = m(inputs[offsets[end]+1:end]) + out I do not have a good intuition for when we need custom rrules, since many layers don't. Is it the assignment operator? The PyTorch implementation was confusing to me, it looks like a lot of platform specific code. The main stuff that adds complexity is the presence of the sparse and padding index parameters, which are not supported here. — Reply to this email directly, view it on GitHub <#2031 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJOZVVLD2XNFCUCPLYLTAN3VXIEFLANCNFSM55K2LTKQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

darsnack

A couple suggestions that you'll want to verify.

src/layers/basic.jl

mcognetta · 2022-09-04T17:14:34Z

I made a few updates following the suggestions presented above. In particular, I have changed the inputs/offsets method to be more Julian. I originally wanted it to be identical to the PyTorch version, but that was 0-indexed. I changed it to be 1 indexed and updated the docs to more clearly explain how that input should be used. I also updated the implementation so it is non-mutating.

It seems there is one last comment that is unresolved: #2031 (comment)

should a vector of vecs input correspond to a vector of vecs output instead

I am open to suggestions on this from the maintainers. To me, the best solution is:

make vector of vecs return a vector of vectors and add a one liner in the docs about converting a vector of vectors to pytorch format.

Thanks for the helpful reviews so far, hopefully we can get this merged soon!

darsnack

This mostly looks good with a couple of minor docstring changes.

The main outstanding question I have is the utility of input/offset style input. What's the value of specifying inputs in this way? It seems very confusing in comparison to a vector of vectors. Is there an upstream layer that will specify an input to EmbeddingBag in this way? If not, can we just do away with this option?

As far as the vector of vectors issue, I would go with the output that is most natural for the downstream layer that receives the output of an EmbeddingBag.

src/layers/basic.jl

mcognetta · 2022-09-07T17:10:05Z

Thanks for the comments. I'll review them tomorrow.

The main outstanding question I have is the utility of input/offset style input. What's the value of specifying inputs in this way? It seems very confusing in comparison to a vector of vectors. Is there an upstream layer that will specify an input to EmbeddingBag in this way? If not, can we just do away with this option?

I included it specifically for feature parity with Pytorch. I agree that it is cumbersome compared to the vector of vectors input, but I think it has utility in that you aren't messing with essentially ragged tensors. And it's quite easy to build them sequentially.

I think it should be kept just for completeness.

As far as the vector of vectors issue, I would go with the output that is most natural for the downstream layer that receives the output of an EmbeddingBag.

I will keep it as is then (so that it returns a matrix, not a vector of vectors).

darsnack · 2022-09-07T17:49:53Z

I included it specifically for feature parity with Pytorch. I agree that it is cumbersome compared to the vector of vectors input, but I think it has utility in that you aren't messing with essentially ragged tensors. And it's quite easy to build them sequentially.

I think it should be kept just for completeness.

Sure, this is okay. I do think the examples in the docstring need to be expanded then to demonstrate the difference between all the input types (especially showing the input/offset case). Do we know where this format is preferred? It might also be good to mention when each style is useful in the docstring.

darsnack · 2022-09-07T17:53:05Z

src/layers/basic.jl

+(m::EmbeddingBag)(bags::AbstractVector{<:AbstractVector}) = reduce(hcat, m.(bags))
+(m::EmbeddingBag)(bags::AbstractMatrix) = reduce(hcat, m.(eachcol(bags)))


After reading the PyTorch docstring, it seems the main advantage of this layer is memory efficiency. So, shouldn't these be mapreduce instead of a broadcast to achieve the same feature?

Unfortunately, mapreduce(f, hcat, collection) is not optimized. But yes, I agree. I will add a todo for when specialized mapreduce functions are added. See: https://discourse.julialang.org/t/different-performance-between-reduce-map-and-mapreduce/85149 and JuliaLang/julia#31137.

julia> (m::EmbeddingBag)(bags::AbstractVector{<:AbstractVector}) = reduce(hcat, m.(bags)) julia> (m::EmbeddingBag)(bags::AbstractMatrix) = reduce(hcat, m.(eachcol(bags))) julia> test(m::EmbeddingBag, bags::AbstractVector{<:AbstractVector}) = mapreduce(m, hcat, bags) julia> test(m::EmbeddingBag, bags::AbstractMatrix) = mapreduce(m, hcat, eachcol(bags)) julia> e = Flux.EmbeddingBag(100=>64) julia> bags = [[rand(1:100) for _ in 1:3] for _ in 1:1000] julia> @btime e(bags); 709.630 μs (14004 allocations: 2.16 MiB) julia> @btime test(e, bags); 14.700 ms (15935 allocations: 124.18 MiB)

Unfortunately, mapreduce(f, hcat, collection) is not optimized

If this is the hurdle, then stack(f, collection) might be the solution, assuming f returns vectors. Needs using Compat, which is certainly already loaded downstream.

The really big memory cost is going to be the gradient of gather. For every column / vector, ∇gather_src is going to allocate like a copy of the weights.

https://github.com/FluxML/NNlib.jl/blob/6f74fad0a2a24e3594fc5229cc515fa25e80f877/src/gather.jl#L80

One could write a more efficient combined rule for this. Or add some thunks to the one in NNlib & wait for AD to learn to exploit them.

This can be done after this PR, right?

Yes. I just mean these concerns will dwarf the hcat cost. (Even on the forward pass, the thing you make to call mean on it will also be much larger.)

mcognetta · 2022-09-16T01:34:54Z

I included it specifically for feature parity with Pytorch. I agree that it is cumbersome compared to the vector of vectors input, but I think it has utility in that you aren't messing with essentially ragged tensors. And it's quite easy to build them sequentially.
I think it should be kept just for completeness.

Sure, this is okay. I do think the examples in the docstring need to be expanded then to demonstrate the difference between all the input types (especially showing the input/offset case). Do we know where this format is preferred? It might also be good to mention when each style is useful in the docstring.

I've expanded the documentation and added notes on the input/offset input type.

src/layers/basic.jl

mcabbott · 2022-09-16T01:56:32Z

src/layers/basic.jl

+(m::EmbeddingBag)(bags::AbstractVector{<:AbstractVector}) = reduce(hcat, m.(bags))
+(m::EmbeddingBag)(bags::AbstractMatrix) = reduce(hcat, m.(eachcol(bags)))


Unfortunately, mapreduce(f, hcat, collection) is not optimized

If this is the hurdle, then stack(f, collection) might be the solution, assuming f returns vectors. Needs using Compat, which is certainly already loaded downstream.

src/layers/basic.jl

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com> Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>

src/layers/basic.jl

mcognetta · 2022-10-02T16:47:22Z

Is there anything else necessary for this PR? Maybe some improvements with mapreduce can be done, but are not possible right now.

mcabbott

I had a look over this. I am still dismayed by the need for 5 distinct bullet points to explain what this thing does... or 7 if you add a description of its present behaviour on onehot arrays.

Embedding has one rule, it returns an array (out, size(x)...), and the same if you do onehotbatch(x) first. It always wants integer indices (and size(::Int) = () for just one). What's the corresponding simple rule?

I think EmbeddingBag should always take vectors of integers, called "bags", where Embedding took integers. Given any collection of such vectors vs, its output is an array (out, size(vs)...). A single vector is a trivial collection, size(vs) == (). An array of integers x is always sliced, vs = eachslice(x, dims=Tuple(2:ndims(x))), hence (out, size(x)[2:end]...) follows.

Then EmbeddingBag(in=>out)(3) should be an error, I think, otherwise it breaks the pattern & that's confusing. (You can use Embedding. Maybe there should be constructors like Embedding(::EmbeddingBag)?)

For onehot arrays, surely a OneHotMatrix is a bag. Then I think it must stand in for a vector of integers in all circumstances: Just one, a vector of OneHotMatrix, and an N-dim OneHotArray.

Despite saying "always sliced" above, the actual implementation should not slice if it doesn't have to. What I put in the suggestion should work for any x::Array{Int,N} and is much faster than the PR.

src/layers/basic.jl

mcabbott

I think these are the test changes needed to match the above.

test/layers/basic.jl

mcognetta · 2022-11-11T04:37:09Z

Thanks for the comments, I'll check it soon and hopefully we can move forward.

test/layers/basic.jl

mcognetta

Left a few comments and will make the changes based on responses.

mcognetta · 2022-11-21T07:54:32Z

src/layers/basic.jl

+(m::EmbeddingBag)(ind::Integer) = error("EmbeddingBag expects an array of indices, not just one")
+
+(m::EmbeddingBag)(hot::AbstractArray{Bool}) = dropdims(m.reduction(Embedding(m.weight)(hot), dims=2), dims=2)
+(m::EmbeddingBag)(hot::AbstractVector{Bool}) = error("EmbeddingBag not defined for a one-hot vector")


This seems to be too general of a type restriction. For example, I could define a MultiHot <: AbstractVector{Bool}, that succinctly encodes a bag with fixed k elements (in fact, this was one of my original use cases for EmbeddingBags), and then if index i is true, it should be included in the bag.

This is a possible encoding. Dispatch on such a type specifically is not forbidden by this method.

So far, I think every other use of one-hot arrays behaves identically if you collect it. This is why I think it makes sense to define these methods for AbstractArray{Bool}. Another boolean type with a different meaning cannot also have this property that collect doesn't change the result.

src/layers/basic.jl

CarloLucibello · 2023-03-24T05:07:30Z

What is the status of this PR?

mcabbott · 2023-03-31T22:47:34Z

Status from my side is that I thought I understood exactly how it ought to work, and perhaps rudely pushed hard to get it done before I forgot... and now I've mostly forgotten :( But should perhaps revive!

mcognetta · 2023-03-31T22:50:11Z

Yes I lost the thread also. I'm happy to wrap it up this weekend, if the proposed changes are what we want.

src/layers/basic.jl

* embedding bag * doc fix * Apply suggestions from code review Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com> * Remove references to `Statistics` Statistics is imported by Flux so we can just call `mean` rather than `Statistics.mean`. * non mutating bag and onehot changes * better docs and todo * input/offset docs * doctest * Apply suggestions from code review Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com> Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com> * reduce docs * broadcast to map * remove extra doc example line * add _splitat * rename input/offset * minor docs * Apply suggestions from code review * Update test/layers/basic.jl * Update test/layers/basic.jl * Update test/layers/basic.jl * typo * docstring * Apply suggestions from code review --------- Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com> Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com> Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>

mcognetta added 2 commits August 2, 2022 20:34

embedding bag

eccd097

doc fix

c437e2e

CarloLucibello reviewed Aug 2, 2022

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

test/layers/basic.jl Show resolved Hide resolved

mcognetta and others added 2 commits August 2, 2022 08:39

Apply suggestions from code review

cbf8836

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

Remove references to Statistics

fbc9e4c

Statistics is imported by Flux so we can just call `mean` rather than `Statistics.mean`.

ToucheSir reviewed Aug 3, 2022

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

darsnack requested changes Aug 10, 2022

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

non mutating bag and onehot changes

f2e7e9d

darsnack requested changes Sep 7, 2022

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

darsnack reviewed Sep 7, 2022

View reviewed changes

mcognetta added 2 commits September 16, 2022 10:26

better docs and todo

5373a41

input/offset docs

7be2fd0

doctest

baf5d15

mcabbott reviewed Sep 16, 2022

View reviewed changes

Apply suggestions from code review

1db1c42

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com> Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>

mcabbott reviewed Sep 16, 2022

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

reduce docs

a962695

mcabbott added the enhancement label Sep 16, 2022

mcognetta added 5 commits September 16, 2022 12:49

broadcast to map

fdd1bb6

remove extra doc example line

5bca3b0

add _splitat

6c04ecd

rename input/offset

89db5f5

minor docs

4aa753e

mcabbott mentioned this pull request Sep 26, 2022

PyTorch feature parity #1431

Open

92 tasks

mcabbott requested changes Oct 15, 2022

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

mcabbott reviewed Nov 1, 2022

View reviewed changes

test/layers/basic.jl Outdated Show resolved Hide resolved

test/layers/basic.jl Outdated Show resolved Hide resolved

test/layers/basic.jl Outdated Show resolved Hide resolved

test/layers/basic.jl Outdated Show resolved Hide resolved

mcabbott added 2 commits November 10, 2022 23:35

Apply suggestions from code review

091fe71

Update test/layers/basic.jl

a98c7a2

mcabbott reviewed Nov 11, 2022

View reviewed changes

test/layers/basic.jl Outdated Show resolved Hide resolved

Update test/layers/basic.jl

fcefac3

mcabbott reviewed Nov 15, 2022

View reviewed changes

test/layers/basic.jl Outdated Show resolved Hide resolved

Update test/layers/basic.jl

ba64701

mcabbott reviewed Nov 20, 2022

View reviewed changes

test/layers/basic.jl Outdated Show resolved Hide resolved

typo

5bc01f5

mcognetta commented Nov 21, 2022

View reviewed changes

Merge branch 'master' into embedding_bag

6878df8

docstring

fae30da

mcabbott reviewed Mar 31, 2023

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

24dd98a

CarloLucibello approved these changes Apr 17, 2023

View reviewed changes

mcabbott merged commit dfea43c into FluxML:master Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `EmbeddingBag` #2031

Add `EmbeddingBag` #2031

mcognetta commented Aug 2, 2022 •

edited

Loading

mcognetta commented Aug 2, 2022

DhairyaLGandhi commented Aug 3, 2022 via email

darsnack left a comment

mcognetta commented Sep 4, 2022

darsnack left a comment

mcognetta commented Sep 7, 2022

darsnack commented Sep 7, 2022

darsnack Sep 7, 2022

mcognetta Sep 16, 2022

mcabbott Sep 16, 2022

mcabbott Sep 16, 2022

mcognetta Sep 16, 2022

mcabbott Sep 16, 2022

mcognetta commented Sep 16, 2022

mcabbott Sep 16, 2022

mcognetta commented Oct 2, 2022

mcabbott left a comment •

edited

Loading

mcabbott left a comment

mcognetta commented Nov 11, 2022

mcognetta left a comment

mcognetta Nov 21, 2022

mcabbott Mar 31, 2023

CarloLucibello commented Mar 24, 2023

mcabbott commented Mar 31, 2023

mcognetta commented Mar 31, 2023

		(m::EmbeddingBag)(bags::AbstractVector{<:AbstractVector}) = reduce(hcat, m.(bags))
		(m::EmbeddingBag)(bags::AbstractMatrix) = reduce(hcat, m.(eachcol(bags)))

Add EmbeddingBag #2031

Add EmbeddingBag #2031

Conversation

mcognetta commented Aug 2, 2022 • edited Loading

PR Checklist

mcognetta commented Aug 2, 2022

DhairyaLGandhi commented Aug 3, 2022 via email

darsnack left a comment

Choose a reason for hiding this comment

mcognetta commented Sep 4, 2022

darsnack left a comment

Choose a reason for hiding this comment

mcognetta commented Sep 7, 2022

darsnack commented Sep 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcognetta commented Sep 16, 2022

Choose a reason for hiding this comment

mcognetta commented Oct 2, 2022

mcabbott left a comment • edited Loading

Choose a reason for hiding this comment

mcabbott left a comment

Choose a reason for hiding this comment

mcognetta commented Nov 11, 2022

mcognetta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Mar 24, 2023

mcabbott commented Mar 31, 2023

mcognetta commented Mar 31, 2023

Add `EmbeddingBag` #2031

Add `EmbeddingBag` #2031

mcognetta commented Aug 2, 2022 •

edited

Loading

mcabbott left a comment •

edited

Loading