Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use this toolkit to do multi-output regression? #747

Closed
offchan42 opened this issue Aug 25, 2018 · 2 comments
Closed

How can I use this toolkit to do multi-output regression? #747

offchan42 opened this issue Aug 25, 2018 · 2 comments
Labels

Comments

@offchan42
Copy link

offchan42 commented Aug 25, 2018

For example, I have a bunch of real numbers (camera image from headset) as input and I want to predict where my left hand is relative to the camera (4 numbers, x, y, z, length).
My left hand will be visible on the camera.

x,y,z is a unit vector representing direction from camera to the left hand, and length is the distance from the camera to the left hand.

So can this tool support predicting multiple outputs? If yes, how could I do it?
If not please suggest me a tool that can do it or another way of solving my problem.

@rhiever
Copy link
Contributor

rhiever commented Aug 25, 2018

You would have to create a custom TPOT configuration that used operations that support multi-output regression, e.g., the sklearn MultiOutputRegressor. As MultiOutputRegressor takes another estimator as a parameter, see our SelectFromModel example in another configuration dictionary.

I'm not 100% familiar with multi-output support in sklearn, but any operations that work with the MultiOutputRegressor and cross_val_score should also work with TPOT.

You can read more about custom configuration dictionaries here.

@robertritz
Copy link

Could you provide a bit more help with the custom configuration dictionary? I'm attempting to set up a simple custom configuration using the SelectFromModel example you gave. Here is my current config:

tpot_config = {
    'sklearn.multioutput.MultiOutputRegressor': {
        'estimator': {
            'sklearn.ensemble.ExtraTreesRegressor': {
                'n_estimators': [100],
                'max_features': np.arange(0.05, 1.01, 0.05)
            }
        }
    }
}

And here is my code to run TPOT:

pipeline_optimizer = TPOTRegressor(generations=5, population_size=20, max_time_mins=480, n_jobs=-1, verbosity=2, random_state=12345, config_dict=tpot_config)
pipeline_optimizer.fit(X_train, y_train)
print(pipeline_optimizer.score(X_test, y_test))
pipeline_optimizer.export('tpot_exported_pipeline.py')

I receive an error:
ValueError: Error: Input data is not in a valid format. Please confirm that the input data is scikit-learn compatible. For example, the features must be a 2-D array and target labels must be a 1-D array.

Is it necessary to specify the parameters to search for each algorithm? Before reading the documentation and your example I naively just passed through a list of algorithms like so:

tpot_config = {
    'sklearn.multioutput.MultiOutputRegressor': {
        'estimator': ['ExtraTreesRegressor']
      }
}

There are sklearn algorithms that are inherently multioutput, but with MultiOutputRegressor I get many more options. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants