Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors thrown when executing FeatBoost on 6-feature dataset #1

Open
dunnkers opened this issue May 11, 2020 · 3 comments
Open

Errors thrown when executing FeatBoost on 6-feature dataset #1

dunnkers opened this issue May 11, 2020 · 3 comments

Comments

@dunnkers
Copy link
Collaborator

dunnkers commented May 11, 2020

Input is a 6-feature dataset, found here. FeatBoost is executed using the following setup:

    # Setup estimator
    xgboost_ensemble = XGBClassifier(max_depth=3, learning_rate=0.1,\
        n_estimators=200, silent=True, objective='binary:logistic',\
        booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1,\
        max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1,\
        reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5,\
        random_state=0, seed=None, missing=None)
    # Setup FS method
    fs = FeatBoostClassification(estimator=[xgboost_ensemble,\
        xgboost_ensemble, xgboost_ensemble], number_of_folds = 10,\
            siso_ranking_size = 8,\
            max_number_of_features = 100,\
            siso_order=4,\
            epsilon=1e-18,\
            verbose=2)

    # Run Feature Selection
    fs.fit(X, y)

(exactly the same setup as test.py)

  1. First, throws an error in a print message, with parameter verbose=2.

Screenshot 2020-05-11 at 21 04 12

Full error log

(venv) ➜  feature-selection git:(master) ✗  env DEBUGPY_LAUNCHER_PORT=53859 /Users/dunnkers/git/feature-selection/venv/bin/python /Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/launcher /Users/dunnkers/git/feature-selection/jobs/run-featboost.py /Users/dunnkers/git/feature-selection/data/6_bit_mutliplexer 
Ranking pool [FeatBoost_XGBoost]
Running pool... [4 workers, 1 datasets]






Ranking features iteration 01
feature importances of all available feature:
x_001   3.792205
x_003   3.277614
x_004   2.644713
x_002   2.451928
x_006   2.280755
x_005   2.112983
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 22, in ranking_pool
    ranking = ranking_func(X, y)
  File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 25, in FeatBoost_XGBoost
    fs.fit(X, y)
  File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 188, in fit
    return self._fit(X, Y)
  File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 272, in _fit
    selected_variable,best_acc_t = self._siso(X,Y,iteration_number)
  File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 396, in _siso
    ranking, self.all_ranking_ = self._input_ranking(X, Y, iteration_number)
  File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 559, in _input_ranking
    print("%s   %05f" % (self._feature_names[feature_rank[i]], feature_importance[feature_rank[i]]))
IndexError: index -7 is out of bounds for axis 0 with size 6
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 36, in <module>
    run_ranking_pool(FeatBoost_XGBoost)
  File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 42, in run_ranking_pool
    run_pool(ranking_pool, 'ranking', ranking_func, ranking_method)
  File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 99, in run_pool
    pool_results = pool.starmap(func, pool_args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
IndexError: index -7 is out of bounds for axis 0 with size 6

  1. Second, with verbose=1, another error is thrown.

Screenshot 2020-05-11 at 21 11 39

Full error log

(venv) ➜  feature-selection git:(master) ✗  env DEBUGPY_LAUNCHER_PORT=53886 /Users/dunnkers/git/feature-selection/venv/bin/python /Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/launcher /Users/dunnkers/git/feature-selection/jobs/run-featboost.py /Users/dunnkers/git/feature-selection/data/6_bit_mutliplexer 
Ranking pool [FeatBoost_XGBoost]
Running pool... [4 workers, 1 datasets]






Ranking features iteration 01
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 22, in ranking_pool
    ranking = ranking_func(X, y)
  File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 25, in FeatBoost_XGBoost
    fs.fit(X, y)
  File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 188, in fit
    return self._fit(X, Y)
  File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 272, in _fit
    selected_variable,best_acc_t = self._siso(X,Y,iteration_number)
  File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 397, in _siso
    self.siso_ranking_[(iteration_number-1), :] = ranking
ValueError: could not broadcast input array from shape (6) into shape (8)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 36, in <module>
    run_ranking_pool(FeatBoost_XGBoost)
  File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 42, in run_ranking_pool
    run_pool(ranking_pool, 'ranking', ranking_func, ranking_method)
  File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 99, in run_pool
    pool_results = pool.starmap(func, pool_args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: could not broadcast input array from shape (6) into shape (8)

@amjams
Copy link
Owner

amjams commented May 12, 2020

I don't know if these are the reasons behind the error, but consider the following: If it's a 6-feature dataset, then:
siso_ranking_size = 8 This should be less or equal to 6.
max_number_of_features = 100. Same as above.

reasonable values in this case would be 1 and 6 respectively.

@dunnkers
Copy link
Collaborator Author

dunnkers commented May 12, 2020

That seems to explain the error- the 6-feature dataset now runs normally. I didn't know that siso_ranking_size should be <= # dataset features, maybe an assertion in the code and some docs would be nice.

What are reasonable values of siso_ranking_size I could use in my tests? The amount of features in the datasets range from 6 to 100000, so probably using a value of 8 is fine for all other datasets. I could also use a fixed value of 5, so I could use the same value for all tests.

@amjams
Copy link
Owner

amjams commented May 12, 2020

yes, you're right. Some assertions would be helpful.
You could use 5. But for larger datasets it might be helpful to increase it a bit. Just keep in mind how this can affect your runtime. I would say a good rule of thumb is to set it to 10 for datasets with over 100 features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants