Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need help in creating the X and y dataset for fastFM rating prediction #128

Open
chrisbangun opened this issue Nov 30, 2017 · 0 comments
Open

Comments

@chrisbangun
Copy link

chrisbangun commented Nov 30, 2017

Hi,

am I doing things correctly here while building the dataset that valid for fastFM?

So basically, I have a dataframe containing my user-item interaction, along with the context/features and the labels. I then split this dataframe into two: 1) X which contains my user-item interaction along with the features, and 2) y which is the rating.

I then convert my dataframe X into python dictionary and then use sklearn Dictvectorizer in order to create the scipy sparse matrix. I then feed it to the fastFM model. here are the code example:

X_train = train_interaction[['profile_id_encoded', 'item_id_encoded',
                            'popularity_score', 'is_last_interaction']]

y_train = train_interaction['ratings'].values.squeeze()
                            
X_val = val_interaction[['profile_id_encoded', 'item_id_encoded',
                            'popularity_score', 'is_last_interaction']]
y_val = val_interaction['ratings'].values.squeeze()

# X_train and X_val are dataframe while y_train and y_val are now np.array

X_train_dicts = X_train.to_dict('records')
X_val_dicts = X_val.to_dict('records')

from sklearn.feature_extraction import DictVectorizer
import scipy.sparse as sp

vec = DictVectorizer()
vectorizer = vec.fit_transform(X_train_dicts)

#below i convert the csr matrix into csc_matrix
fm_X_train = sp.csc_matrix(vectorizer)

fm = als.FMRegression(n_iter=10000, init_stdev=0.1, l2_reg_w=0, l2_reg_V=0, rank=5)

fm.fit(fm_X_train, y_train)

# prepare for prediction
vec = DictVectorizer()
vectorizer = vec.fit_transform(X_val_dicts)
fm_X_val = sp.csc_matrix(vectorizer)

y_pred = fm.predict(fm_X_val)

print(mean_squared_error(y_pred, y_val)) 

the MSE is bad tho: 93%

did I do things correctly here? really appreciate any help, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant