Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical preprocesing warning #777

Closed
maciekmalachowski opened this issue Sep 11, 2024 · 5 comments
Closed

Categorical preprocesing warning #777

maciekmalachowski opened this issue Sep 11, 2024 · 5 comments

Comments

@maciekmalachowski
Copy link
Contributor

miniconda3\Lib\site-packages\supervised\preprocessing\preprocessing_categorical.py:81:

FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[0 0 0 ... 0 0 0]' has dtype incompatible with category, please explicitly cast to a compatible dtype first.

Dataset: https://www.openml.org/search?type=data&sort=runs&status=active&id=1114

@pplonski
Copy link
Contributor

Please provide code to reproduce the issue.

@maciekmalachowski
Copy link
Contributor Author

def generate_models():
    datasets = [
        kddcup09_upselling.get_data()
    ]

    algorithms=[
            'Baseline',
            'CatBoost',
            'Decision Tree',
            'Extra Trees',
            'LightGBM',
            'Neural Network',
            'Random Forest',
            'Xgboost'
    ]

    for data in datasets:
        for alg in algorithms:
            # create directions for AutoML
            if not os.path.exists(f"AutoML/{data[2]}/{alg}"):
                os.makedirs(f"AutoML/{data[2]}/{alg}")
            
            # various datasets need either rmse or accuracy as metric
            if data[-1] == "reg":
                eval_metric = "rmse"
            else:
                eval_metric = "accuracy"
                
            # create automl object
            automl = AutoML(
                mode="Compete", 
                total_time_limit=600, 
                results_path=f"AutoML/{data[2]}/{alg}", 
                algorithms=[alg],
                train_ensemble=False,
                golden_features=False,
                features_selection=False,
                stack_models=False,
                kmeans_features=False,
                explain_level=0,
                boost_on_errors=False,
                eval_metric=eval_metric,
                validation_strategy={
                "validation_type": "kfold",
                "k_folds": 5,
                "shuffle": True,
                "stratify": True,
                "random_seed": 123
                },
                start_random_models=10, 
                hill_climbing_steps=3, 
                top_models_to_improve=3, 
                random_state=1234)
            
            # train automl
            automl.fit(data[0], data[1])
  from sklearn.datasets import fetch_openml
  
  def get_data():
      # read data from openml page
      name = "Kddcup09_upselling"
      dataset_type = "binary"
      data = fetch_openml(data_id=1114, as_frame=True)
      X = data.data
      y = data.target
  
      return X, y, name, dataset_type

@pplonski
Copy link
Contributor

@Marchlak are you able to reproduce this issue?

@Marchlak
Copy link
Contributor

@pplonski I fixed it in 0312ae3
image
image

@pplonski
Copy link
Contributor

Good job 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants