Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPR FMRecommender #93

Open
parklize opened this issue Apr 3, 2017 · 7 comments
Open

BPR FMRecommender #93

parklize opened this issue Apr 3, 2017 · 7 comments

Comments

@parklize
Copy link

parklize commented Apr 3, 2017

Hi @ibayer, I'd like to load X_train separately when it is to big to fit at a time, i.e., instead of

fm = bpr.FMRecommender(n_iter=10,
               init_stdev=0.01, l2_reg_w=.5, l2_reg_V=.5, rank=10,
               step_size=.002, random_state=11)
fm.fit(X_train, compares)

I'd like to do something like this..., can I get some advice???

 for i in range(10):
    compares = sklearn.utils.shuffle(compares)
    for [some_parts] in compares:
         fm.fit(X_train[some_parts], compares[some_parts])

where instead of creating a csc_matrix X_train, parts of them are created and loaded for fitting fm.

@ibayer
Copy link
Owner

ibayer commented Apr 3, 2017

Hi the 'n_more_iter` option allows to train the model over junks of data see http://ibayer.github.io/fastFM/guide.html#learning-curves, you just have to swap the training set at every iteration of the loop.

fm.fit(X_train, y_train, n_more_iter=step_size)

@parklize
Copy link
Author

parklize commented Apr 3, 2017

Hi thank you for your prompt reply. The option 'n_more_iter` optionIt's only available for FMRegression, but not for FMRecommender, isn't it?

class fastFM.als.FMRegression has fit(X_train, y_train, n_more_iter=0) while 
class fastFM.bpr.FMRecommender only has fit(X, pairs)

from http://ibayer.github.io/fastFM/api.html#fastFM.bpr.FMRecommender doc

@parklize
Copy link
Author

parklize commented Apr 3, 2017

Hi @ibayer, if I understand correctly, there should be a method with the parameter 'n_more_iter`, and I tried to modify the fit() function but couldn't make it. I think the ffm_sgd_bpr_fit() in ffm.c should be correspondingly modified. Can you have a look at this ? I'm not familiar with c language. Thanks a lot....

def fit(self, X, pairs, n_more_iter=0):
    """ Fit model with specified loss.

    Parameters
    ----------
    X : scipy.sparse.csc_matrix, (n_samples, n_features)

    y : float | ndarray, shape = (n_compares, 2)
            Each row `i` defines a pair of samples such that
            the first returns a high value then the second
            FM(X[i,0]) > FM(X[i, 1]).
    """
    # The sgd solver expects a transposed design matrix in column major
    # order (csc_matrix).
    X = X.T  # creates a copy
    X = check_array(X, accept_sparse="csc", dtype=np.float64)
    assert_all_finite(pairs)

    pairs = pairs.astype(np.float64)

    # check that pairs contain no real values
    assert_array_equal(pairs, pairs.astype(np.int32))
    assert pairs.max() <= X.shape[1]
    assert pairs.min() >= 0
                    
    self.n_iter = self.n_iter + n_more_iter

    if n_more_iter > 0:
        print 'warm start'
        _check_warm_start(self, X.T)
        self.warm_start = True
            
    self.w0_, self.w_, self.V_ = ffm.ffm_fit_sgd_bpr(self, X, pairs)
    
    if self.iter_count != 0:
        self.iter_count = self.iter_count + n_more_iter
    else:
        self.iter_count = self.n_iter

    # reset to default setting
    self.warm_start = False
    
    return self

@ibayer
Copy link
Owner

ibayer commented Apr 4, 2017

@parklize
your right warm start is currently not implemented for SGD (and BPR). I'm not sure, but it might be possible to add warm start support without changing the C code. Have a look at:

#75
#74

@jerry-rubikloud
Copy link

@ibayer Is it possible to change the step_size? SGD with a constant step size does not converge well in general or this is a particular implementation of SGD that takes care of that internally?

@ibayer
Copy link
Owner

ibayer commented Dec 6, 2018

@jerry-rubikloud
The step_size is independent of the iteration number. It's indeed possible that SGD only converges with a very slow constant step_size in some situations which would come with very slow convergence.
Using multiple fit calls with decreasing step_sizes would be helpful but warm_start is currently not implemented for sgd.

A new BPR / SGD implementation is on it's way to solve this issues but it might take a while till we have it ready for release.

@jerry-rubikloud
Copy link

A new BPR / SGD implementation is on it's way to solve this issues but it might take a while till we have it ready for release.

@ibayer Thanks for the quick reply and thanks for creating this package. The BPR implementation is actually the reason why I choose this library over the others. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants