Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring #19

Closed
dfdx opened this issue Jan 11, 2016 · 31 comments
Closed

Refactoring #19

dfdx opened this issue Jan 11, 2016 · 31 comments

Comments

@dfdx
Copy link
Owner

dfdx commented Jan 11, 2016

As recently discussed in #9, it will be nice to refactor the project to better reflect different kinds of RBMs and different ways to learn it. Currently I can think of the following variations:

RBM kinds

  • ordinary RBMs (with Bernoulli and Gaussian variables, as well as using pure probabilities for visible layer)
  • Conditional RBMs
  • CLRBM - implementation of RBM for OpenCL (TBD)
  • I also have particular interest in implementing original Harmonium by Paul Smolensky as it may work better for very large RBMs with sparse weight matrix

gradient calculation

  • contrastive divergence
  • persistent contrastive divergence
  • EMF approximation by @sphinxteam

weight update options

  • learning rate
  • momentum
  • weight decay
  • sparsity

Given these requirements, I imagine minimal RBM setup to look something like this (for all but CLRBM types):

type RBM
    W::Matrix{Float64}
    vbias::Vector{Float64}
    hbias::Vector{FLoat64}
    gradient::Function
    update!::Function
end

where gradient() and update!() are closures initialized with all needed options. Then a single method fit() can use these closures to learn RBM and provide monitoring.

Note that this doesn't restrict implementations to the single type, but instead provides a contract, i.e. specifies fields required to provide compatibility between different methods for learning and inference.

@Rory-Finnegan @eric-tramel Does it makes sense for you? Do your models fit into this approach?

@rofinn
Copy link
Contributor

rofinn commented Jan 12, 2016

Yeah, that was very similar to what I was thinking.

+1

Since I don't tend to consider fields as part of the public interface, I'd be inclined to have the interface only define required methods rather than fields, but that's just me .

This is probably a minor implementation detail, but if there was a way of making the update functions more composable while remaining relatively performant that would be ideal. For example, there are several methods for computing sparsity and decay penalties with different parameters and it would be nice if the update function could take these arbitrary calculations as functions. I suppose that would probably result in less efficient matrix operations though...? I suppose another options could be to provide extra update! closures which trade off performance in favour of composability and document that. Thoughts?

As for the ConditionalRBM, this would cover my use cases fine.

@dfdx
Copy link
Owner Author

dfdx commented Jan 12, 2016

Since I don't tend to consider fields as part of the public interface, I'd be inclined to have the interface only define required methods rather than fields, but that's just me.

I imagine it more like protected fields in OO-languages, i.e. something that only related types know about. For all outer interaction I tend to use accessor method as well.

For example, there are several methods for computing sparsity and decay penalties with different parameters and it would be nice if the update function could take these arbitrary calculations as functions. I suppose that would probably result in less efficient matrix operations though...?

I thought about something like this:

type RBM
    ...
    update!::Function
end

function decay_weight!(W::Matrix{Float64}, opts)
    decay_rate = opts[:weight_decay]
    scal!(length(W), decay_rate, W, 1)  # W <- decay_rate * W, in place
end

function add_sparsity!(W::Matrix{Float64}, opts)
    ...
end

function compose_updates(fn_opts...)
    function update!(W::Matrix::Float64)
        for fn, opts in fn_opts
            fn(W, opts)
        end
    end
    return update!
end

...
rbm.update! = compose_updates((weight_decay, Dict(decay_rate=>0.9)), (add_sparsity, Dict(...)))

compose_updates composes several functions, each of which updates weight matrix in-place. This way we have zero performance overhead for matrix operations and only tiny (say, 0.01%) overhead for calling function variables instead of native dispatching.

(There should be better syntax for setting update function, but we can figure it later.)

@rofinn
Copy link
Contributor

rofinn commented Jan 12, 2016

Alright, that makes sense. Thanks.

@rofinn
Copy link
Contributor

rofinn commented Jan 22, 2016

Hi, I just came across this paper and was wondering if the refactoring could include a way of setting the floating point precision used?

@dfdx
Copy link
Owner Author

dfdx commented Jan 22, 2016

I see no fundamental limitations for this, though API is unclear for me know.

@rofinn
Copy link
Contributor

rofinn commented Jan 22, 2016

What about just parameterizing the RBM and adding a precision keyword to the constructor?

@dfdx
Copy link
Owner Author

dfdx commented Jan 23, 2016

This is exactly what I'm experimenting with right now in branch refactoring (see rbm2.jl for a state of affairs). One issue that I've encountered is that BLAS procedures are defined only for Float32 and Float64 (as well as their complex equivalents - Complex64 and Complex128, but I believe they are out of scope). One option is to wrap ordinary matrix operations into functions with the same names (e.g. define scal!(alpha::Float16, a::Array{Float16,2}) = alpha .* a), but performance may be just terrible:

julia> @time for i=1:1_000_000 scal!(200, Float32(2), X32, 1) end  
   0.079212 seconds

@time for i=1:1_000_000 Float16(2.) * X16 end     
   2.765985 seconds (2.00 M allocations: 534.058 MB, 0.75% gc time)

I believe internally in non-BLAS operations on arrays of floats < Float64 these arrays are first converted into Float64 and after operation converted back to smaller representation. So I'm not sure what we gain here.

On other hand, BLAS can operate on Float32 directly and as efficiently as with Float64, but requires smaller space (in exchange for reduced precision). So having RBM parametrised by element type is a good feature anyway.

@rofinn
Copy link
Contributor

rofinn commented Jan 23, 2016

Yeah, it still might be useful to give the option of at least using Float32. I imagine the only times Float16 might be useful is for network transfers in distributed learning rules, which is probably outside of the scope here.

@eric-tramel
Copy link
Collaborator

@Rory-Finnegan, @dfdx: The modifications you both describe seem like a good direction for the repository. Especially the application of compassable update functions, this can certainly allow for a lot of modularity in the kinds of regularizations that get applied to parameter updates.

Speaking of modularity, perhaps one should consider addressing the possibility of non-binary distributions for the visible units (and perhaps even the hidden units). @dfdx had an initial implementation of Gaussian-distributed visible units, for example. (We're working on some extensions on this point, we'll have something to say about it in the next months).

However, using general priors would require a change to the RBM structure to include fields for an arbitrary number of prior parameters at each variable on the hidden and visible side. E.g.

type RBM
    W::Matrix{Float64}
    vParam::Mat{Float64}
    hParam::Mat{FLoat64}
    gradient::Function
    update!::Function
end

In the case of binary distributions, these Param fields resolve to the normal hbias and vbias. For Gaussian distributed units, these become a mean and variance. Arbitrary discrete distributions might be desired as well. I write the Params as matrix because I don't know the best possible way of composing this arbitrary structure. Of course, initialization by Distribution type should properly assign the space for the parameters.

Additionally, the gradient updates on variable distribution parameters should be handled differently from the updates of the couplings, W. Specifically, the parameter update on vParam and hParam should be generalized such that a variable number of prior-dependent gradients could be utilized.

@dfdx
Copy link
Owner Author

dfdx commented Jan 30, 2016

Speaking of modularity, perhaps one should consider addressing the possibility of non-binary distributions for the visible units (and perhaps even the hidden units).

I have already expanded it in refactoring branch to support degenerate distribution, which allows to always take exact mean values for specified units during sampling (this was recommended in Hinton's practical guide on training RBMs, though formulating it as degenerate distribution is mine). Adding support for arbitrary distributions sounds not that easy for me however, because they lead to change in log-likelihood and thus gradient.

Also, I don't think we really need to change hbias and vbias. Essentially, these 2 fields are not parameters of a distribution, but instead weights in log-linear model used to calculate clique potentials and then energy. Recall that sampling is done from distribution means calculated (for hidden units) as logistic(W * vbias). And these means are essentially one of parameters of distribution of hidden and visible variables. So I think we need to keep hbias and vbias to be just weights.

Instead, for custom distributions we might add 2 more fields (I will follow your notation except for naming conventions) - vparams and hparams. Since we already have excellent Distrubutions.jl package, we can use its types for storing distribution parameters. Thus, we get the following structure:

type RBM{T,V,H}
    W::Matrix{T}
    hbias::Vector{T}
    vbias::Vector{T}
    hparams::Vecotr{H}
    vparams::Vector{V}
end

Note, that I removed gradient and update! from RBM fields (and put them into fit() config parameters instead) to split structure of RBM and its training method.

Does it sound reasonable?

@dfdx
Copy link
Owner Author

dfdx commented Feb 25, 2016

@Rory-Finnegan I'm near the end of refactoring conditional RBMs now, but I'm not sure how to test if they still behave correctly. For ordinary RBMs I can take, say, MNIST and inspect fitting result visually, as well as check final pseudo-likelihood. Do you have any dataset / example that I can use for testing conditional RBMs?

@rofinn
Copy link
Contributor

rofinn commented Feb 25, 2016

The dataset I'm currently testing on isn't publicly available, but I believe Graham Taylor used MNIST in one of his papers. The general approach used was train the CRBM to condition the clean image on a noisy version of it. For testing, you can look at the accuracy of predicting "cleaned" image given their noisy counterparts. If you want I can write these tests.

@dfdx
Copy link
Owner Author

dfdx commented Feb 25, 2016

It would be perfect. Then we will be able to use such tests as a usage example as well.

@rofinn
Copy link
Contributor

rofinn commented Feb 25, 2016

Sounds good. Is there a branch you'd like me to add that to?

@dfdx
Copy link
Owner Author

dfdx commented Feb 26, 2016

You can add it to the master and I will then merge changes to the refactoring branch and compare results.

@rofinn
Copy link
Contributor

rofinn commented Feb 27, 2016

PR #21 opened with those tests.

@dfdx
Copy link
Owner Author

dfdx commented Feb 27, 2016

Great, thank you! I'm going to create a PR for refactoring tomorrow so you could check if new design matches your thoughts / needs.

@rofinn
Copy link
Contributor

rofinn commented Feb 29, 2016

I just realized that the conditional rbm could be generalized a bit better, should I make a PR to master or wait till the refactoring gets merged? The issue is that I'm currently just conditioning on previous observations, but there isn't any reason you couldn't condition on arbitrary things.

@dfdx
Copy link
Owner Author

dfdx commented Feb 29, 2016

@Rory-Finnegan Does it mean that we can use ConditionalRBM for classification then? If so, it opens quite a lot of new possibilities :)

PR #22 is now open. Please, check it out and if it looks mostly good, then you can base your changes on it. Otherwise, feel free to make a separate PR to master.

@rofinn
Copy link
Contributor

rofinn commented Feb 29, 2016

@dfdx In theory, yes, you could use a conditional rbm for classification. You'd just need to softmax the label predictions. I'm not sure how common that is or if there are papers that compare it to other common methods, but we could always test it on the MNIST dataset and see how it does.

The PR looks good to me, so I'll just add my changes to it.

@rofinn
Copy link
Contributor

rofinn commented Mar 1, 2016

PR #23 to refactoring open.

@rofinn
Copy link
Contributor

rofinn commented Mar 2, 2016

So 2 things I've thought of while going through the refactoring branch is

  1. It would be nice if the precision was a bit more flexible. I might want an Float32 RBM, but it isn't maintainable for me to update all of my code to convert Float64 context variables and input data to Float32.

  2. I'm currently randomizing the ordering of patterns in my training matrices, but it would be nice if the context supported a randomize batches flag to do this when creating batches.

    Thoughts?

@dfdx
Copy link
Owner Author

dfdx commented Mar 3, 2016

  1. It makes sense, but I'm worried about implicit performance issues. For example, if we silently convert each batch before processing, user can get performance degradation on each iteration without even knowing about the issue. Explicit flag allowing conversion of just good documentation may help, but first I should evaluate performance consequences of such implicit conversion.
  2. It should be quite easy to add randomization for for batch order, I added it to TODO list.

@dfdx
Copy link
Owner Author

dfdx commented Mar 5, 2016

Randomization for batch order is here. I need one more day for flexible precision (should be trivial to implement, but I'm very short in time now).

@rofinn
Copy link
Contributor

rofinn commented Mar 5, 2016

Awesome, thank you! I'm still getting some NaNs and Infs on the pseudo-likelihood calculation, so I'll probably have another PR once I figure out where they're coming from.

@dfdx
Copy link
Owner Author

dfdx commented Mar 6, 2016

Regarding flexible precision, I just pushed a change that converts each batch to an array of RBM's type in a fit() function. Adding to to other parts of the code is trickier than I thought and creates several design questions. I will open a separate issue to address this questions.

Meanwhile, I see that this branch is mostly ok, so I'm going to merge it into master in 24 hours and add discuss new features separately.

@rofinn
Copy link
Contributor

rofinn commented Mar 7, 2016

FYI, with the recent changes I'm getting a bunch of warnings like WARNING: Option 'persistent_chain' is unknownm ignoring. I'm guessing that persistent_chain, dW_buf and dW_prev just needed to be added to KNOWN_OPTIONS?

@dfdx
Copy link
Owner Author

dfdx commented Mar 7, 2016

Basically not, they should be created during the first call to corresponding method (recall that context holds both - options and buffers, and these 3 are exactly the latter). Do you need to pass them explicitly for some reason?

@rofinn
Copy link
Contributor

rofinn commented Mar 7, 2016

I wasn't explicitly passing them in, but I figured why I was getting the error and you haven't been. I was running multiple RBMs in a row with the same context, which resulted in @get_array setting the values in the context on the first call to fit and produces warning for subsequent calls.

@dfdx
Copy link
Owner Author

dfdx commented Mar 7, 2016

Ah, that makes sense. I think we can simply copy input context (i.e. options only) instead of using and populating the original one. I will push the change.

@dfdx
Copy link
Owner Author

dfdx commented Mar 8, 2016

refactoring branch is now merged into master. Please, feel free to open separate issues to address further improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants