Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoring parameter in TPOTRegressor unable to take customized loss functions #648

Closed
miteshyadav opened this issue Jan 4, 2018 · 2 comments
Labels

Comments

@miteshyadav
Copy link

miteshyadav commented Jan 4, 2018

I am trying to fit my TPOTRegressor using a customized scoring function. I have followed the instructions as per given on the website but it throws an error.

def rmsl_error(y, h): 
    """
    Compute the Root Mean Squared Log Error for hypothesis h and targets y

    Args:
        h - numpy array containing predictions with shape (n_samples, n_targets)
        y - numpy array containing targets with shape (n_samples, n_targets)
    """
    return np.sqrt(np.square(np.log(h + 1) - np.log(y + 1)).mean())

from sklearn.metrics.scorer import make_scorer
my_custom_scorer = make_scorer(rmsl_error, greater_is_better=False)

from tpot import TPOTRegressor
#del final_df['day']

tpot = TPOTRegressor(generations=10, population_size=50, verbosity=2,n_jobs=-1,cv=iter_cv,scoring=my_custom_scorer)
print ('aaaaaaaaaaaaaaaaa')
tpot.fit(final_df.iloc[:,final_df.columns!='count'].values,final_df.iloc[:,6].values)
print ('bbbbbbbbbbbbbbbb')

The following is the screenshot of the error message:

image

Same function works fine for any other regressor

image

@weixuanfu
Copy link
Contributor

weixuanfu commented Jan 4, 2018

Please check the issue #645 . I think it is a notebook-related issue. You may not use n_jobs > 1 with customized scoring functions using current version (0.9.1) of TPOT in Jupyter notebook. Also I fixed another bug in the new scoring API and merged into dev branch. I will work on this issue and release a patch soon.

@GinoWoz1
Copy link

GinoWoz1 commented Sep 16, 2018

Use my loss function @miteshyadav - it works fine for me. I actually used the same one you had but ran into issues.

def rmsle_loss(y_true, y_pred):
    assert len(y_true) == len(y_pred)
    try:
        terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
    except:
        return float('inf')
    if not (y_true >= 0).all() and not (y_pred >= 0).all():
            return float('inf')
    return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5

rmsle_loss = make_scorer(rmsle_loss,greater_is_better=False)```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants