Custom scoring function and forkserver #664

bartdp1 · 2018-02-05T15:37:16Z

I am using forkserver in order to avoid freezing of TPOT on large dataset. However, i also use a customer scoring function:

def get_ranks(arr):
    temp = np.argsort(arr)
    ranks = np.empty_like(temp).astype('float_')
    ranks[temp] = np.arange(len(arr)) * 1.0 / len(arr)
    return ranks

def mcc_at_threshold(y, y_pred, threshold):
    y_pred_threshold = get_ranks(y_pred) > (1-threshold)
    return matthews_corrcoef(y, y_pred_threshold)
cust_scoring_mcc = make_scorer(mcc_at_threshold, needs_threshold=True, threshold=target_churn)

when using scoring = cust_scoring_mcc as argument for TPOT, it crashes and i find the following in the traceback:

ValueError: 'mcc_at_threshold' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']

Any thoughts?

The text was updated successfully, but these errors were encountered:

bartdp1 · 2018-02-05T15:44:29Z

I think #645 is the same issue.

weixuanfu · 2018-02-05T15:46:07Z

What is the version of TPOT? I think TPOT somehow cannot recognized the customized scoring function. I agree that it maybe related to #645. What will happen if n_jobs=1?

bartdp1 · 2018-02-05T15:47:11Z

It works when standard fork is used for multihreading. Using latest version of TPOT

bartdp1 · 2018-02-05T15:49:06Z

No issue when n_jobs = 1.

weixuanfu · 2018-02-05T15:51:55Z

Hmm, weird. It seems that this sklearn's scorer has this pickable issue with forkserver.

bartdp1 · 2018-02-05T15:55:48Z

Using the customer scorer in GridSearchCV with forkserver seems to make everything freeze. No exception is raised

weixuanfu · 2018-02-05T16:04:22Z

Yep, I reproduced this issue with codes below:

import multiprocessing

if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')
    from sklearn import linear_model, metrics
    from sklearn.model_selection import GridSearchCV
    import numpy as np
    np.random.seed(42)
    X_train = np.random.random((1000,10))
    y_train = np.random.random(1000)

    def RMSLE(p,a):
        return np.sqrt(np.mean( (np.log(p+1) - np.log(a+1))**2 ))

    rmsle_score = metrics.make_scorer(RMSLE,greater_is_better=False)
    parameters = {'fit_intercept':(True, False), 'normalize':[True, False]}
    regr = linear_model.LinearRegression()
    reg1 = GridSearchCV(regr, parameters, verbose=2,scoring=rmsle_score,n_jobs=-1)
    reg1.fit(X_train, y_train)

I will report this issue to scikit-learn's repo

bartdp1 · 2018-02-05T16:06:18Z

That's very kind of you. Could you link the issue? Would like to track.

weixuanfu · 2018-02-05T16:16:13Z

Heym I found this old related issue in sklearn, which seems it is still unsolved yet. I will track this one.

weixuanfu added the question label Feb 5, 2018

bartdp1 changed the title ~~Customer scoring function and forkserver~~ Custom scoring function and forkserver Feb 5, 2018

stin7 mentioned this issue Mar 22, 2020

ValueError: 'RMSLE' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options. #954

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom scoring function and forkserver #664

Custom scoring function and forkserver #664

bartdp1 commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018

Custom scoring function and forkserver #664

Custom scoring function and forkserver #664

Comments

bartdp1 commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018

bartdp1 commented Feb 5, 2018

weixuanfu commented Feb 5, 2018