-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variable axis on which the "one hot" property holds #35
Comments
Maybe the first question should be: What comes after this layer? In this package, I think the efficient methods are that A variant of path 2 is also just to wrap |
I expect this layer to be followed by I'll try both ways since I'm close to done on path 1, and path 2 seems simple at first glance. In terms of tests for correctness I'm assuming to use the Flux and OneHotArrays tests as a first step, and add tests as necessary. Thanks for the help :) |
Since julia> @which rand(3,4) * onehotbatch([1,1,2], 1:4)
*(A::AbstractMatrix, B::Union{OneHotArray{var"#s13", 1, var"N+1", I}, Base.ReshapedArray{Bool, var"N+1", <:OneHotArray{var"#s13", <:Any, <:Any, I}}} where {var"#s13", var"N+1", I})
@ OneHotArrays ~/.julia/packages/OneHotArrays/T3yiq/src/linalg.jl:7
julia> @which transpose(onehotbatch([1,1,2], 1:4)) * rand(4,3)
*(A::AbstractMatrix, B::AbstractMatrix)
@ LinearAlgebra ~/.julia/dev/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:108
Operations like |
Motivation and description
I am working on a layer that produces one hot outputs, so I am looking into using OneHotArrays.jl.
My gripe is that currently the datatype only supports the one hot vectors to extend on the first axis.
I thought I'd write my thoughts and possible implementations of the variable axis, to get some feedback and context from other maintainers and users here (I am very new to Julia and Flux, coming from working with python).
Possible Implementation
Implementation path 1 (WIP), change the constructors, size and getindex:
The idea with this is to maintain the sparse nature of the representation for later optimized multiplications, backptop etc.
While working on this I also hit upon path 2, to reuse all the original code, but use the new
axis
parameter to do appropriate permutations of the underlying (1,...) dimensional object before computations.I expect to do a PR of this soon, but I'd love to hear your thoughts: do you think the first approach is better (more memory and compute efficient?)? But also it is probably harder to maintain and test.
The text was updated successfully, but these errors were encountered: