Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring #22

Merged
merged 25 commits into from
Mar 7, 2016
Merged

Refactoring #22

merged 25 commits into from
Mar 7, 2016

Conversation

dfdx
Copy link
Owner

@dfdx dfdx commented Feb 29, 2016

Main changes:

  1. Fitting process is split into 2 clear parts - gradient calculation and its application to parameters. By default, for this purpose functions gradient_classic() and update_classic!() are used.
  2. Classic updater is split into basic update_weights! and a number of functions modifying gradient, including those for weight decay (L1 and L2), sparsity, learning rate, etc. All such functions have the same signature, so one can easily combine them in an arbitrary way.
  3. Everything is configurable. Most functions take a "context" (ctx) - a dictionary used for holding configuration and buffers. We don't need to pass 3-5 options + buffers to each function anymore, now they all are combined into a context.
  4. In addition to configurable gradient and updater functions, user can now provide (via context) a scorer for measuring model "fitness" (by default, pseudo-likelihood is used) and a reporter to report progress during training (by default, TextReporter is used which simply prints current epoch, score and elapsed time).
  5. The package now supports both - Float64 and Float32. Support for other (real?) data types may be provided given that BLAS functions are defined for them.
  6. API is mostly the same, little to no effort should be needed to start using the new version.

Two things left outside this refactoring. One is EMF approximation by @sphinxteam which seems like a perfect option for gradient calculation. Corresponding paper has a little bit too much physics for me, but if the team finds it possible to transform their work into this repository, it will be a great advantage for the whole Julia stats/deep learning community.

Another thing that I unsuccessfully tried to borrow is visual monitor from the same fork. Unfortunately, in its current state it's closely bound to TAP free energy, which makes it quite hard to use as is. Yet, it's quite possible I will include something similar later.

Comments and criticism are highly welcome.

CC: @Rory-Finnegan @eric-tramel

@dfdx dfdx mentioned this pull request Feb 29, 2016

function split_vis(crbm::ConditionalRBM, vis::Mat{Float64})
steps=5, sigma=0.01)
ConditionalRBM(Float64, Bernoulli, Bernoulli, n_vis, n_hid, steps;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want this to be ConditionalRBM(Float64, V, H, n_vis, n_hid, ...? If so I can change that when I add my generalization changes.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, thanks!

rofinn and others added 2 commits February 29, 2016 21:26
Rather than assuming the conditioned patterns will be the visible
units at previous time steps, we just take a size to condition on and
provide a constructor that takes the number of steps as a feature.

With the above change the split_vis method has changed and all references
to `history` or `hist` have been changed to `cond`.

Also, fixed an issue on line 51 which was passing `Bernoulli` instead of V and
H parametrics to the more general constructor.
end
return samples
end


function sample_hiddens{V,H}(rbm::AbstractRBM{V, H}, vis::Mat{Float64})
function sample_hiddens{T,V,H}(rbm::AbstractRBM{T,V,H}, vis::Mat{T})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to accept a vis matrix of typeF<:AbstractFloat and do a convert(Array{T}, vis) if F != T

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree on implicit conversion, I don't see a reason to limit T to be an instance of AbstractFloat - one may be interested in passing matrix of Ints or even Boolean, which is very natural for Bernoulli RBM.

dfdx and others added 6 commits March 4, 2016 02:10
…issues.

1. In the sparsity calculation for conditional rbms `hid_means` takes
   just the visible input provided by `split_vis(rbm, X)`.
2. The `gemm!` calls in gradient_classic for the conditional weights
   weren't using the appropriate precision (ie: Float32 w/ sparsity)
3. The `free_energy` function will often produces NaNs do to `log(0)`
   or lack of precision.
Fixed sparsity calculation for conditional rbm and several precision …
dfdx added a commit that referenced this pull request Mar 7, 2016
@dfdx dfdx merged commit 2dc5d15 into master Mar 7, 2016
@dfdx dfdx deleted the refactoring branch June 17, 2017 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants