Skip to content

TabPFNRegressor unstable for scipy<1.11.0 #175

@LeoGrin

Description

@LeoGrin

Describe the bug

Fitting TabPFNRegressor with older scipy versions often breaks. It seems that it works reliably starting from 1.11.0.

Steps/Code to Reproduce

import numpy as np
import torch
import sklearn.datasets
from tabpfn import TabPFNRegressor

# Fetch the first 100 samples of the California housing dataset
X, y = sklearn.datasets.fetch_california_housing(return_X_y=True)
X, y = X[:100], y[:100]

regressor = TabPFNRegressor(n_estimators=1, device="cpu", random_state=42)
regressor.fit(X, y)

Expected Results

No error is thrown

Actual Results

/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/preprocessing/_data.py:3256: RuntimeWarning: overflow encountered in power
  out[~pos] = -(np.power(-x[~pos] + 1, 2 - lmbda) - 1) / (2 - lmbda)
Traceback (most recent call last):
  File "/Users/leo/VSCProjects/new/TabPFN/test_error.py", line 11, in <module>
    regressor.fit(X, y)
  File "/Users/leo/VSCProjects/new/TabPFN/src/tabpfn/regressor.py", line 503, in fit
    self.executor_ = create_inference_engine(
  File "/Users/leo/VSCProjects/new/TabPFN/src/tabpfn/base.py", line 213, in create_inference_engine
    engine = InferenceEngineCachePreprocessing.prepare(
  File "/Users/leo/VSCProjects/new/TabPFN/src/tabpfn/inference.py", line 265, in prepare
    configs, preprocessors, X_trains, y_trains, cat_ixs = list(zip(*itr))
  File "/Users/leo/VSCProjects/new/TabPFN/src/tabpfn/preprocessing.py", line 664, in fit_preprocessing
    yield from executor(  # type: ignore
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/joblib/parallel.py", line 1918, in __call__
    return output if self.return_generator else list(output)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/Users/leo/VSCProjects/new/TabPFN/src/tabpfn/preprocessing.py", line 571, in fit_preprocessing_one
    res = preprocessor.fit_transform(X_train, cat_ix)
  File "/Users/leo/VSCProjects/new/TabPFN/src/tabpfn/model/preprocessing.py", line 398, in fit_transform
    X, categorical_features = step.fit_transform(X, categorical_features)
  File "/Users/leo/VSCProjects/new/TabPFN/src/tabpfn/model/preprocessing.py", line 987, in fit_transform
    Xt = transformer.fit_transform(X[:, self.subsampled_features_])
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 726, in fit_transform
    result = self._fit_transform(X, y, _fit_transform_one)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 657, in _fit_transform
    return Parallel(n_jobs=self.n_jobs)(
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/joblib/parallel.py", line 1918, in __call__
    return output if self.return_generator else list(output)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
    return self.function(*args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/pipeline.py", line 894, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/pipeline.py", line 438, in fit_transform
    Xt = self._fit(X, y, **fit_params_steps)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/pipeline.py", line 360, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/joblib/memory.py", line 312, in __call__
    return self.func(*args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/pipeline.py", line 894, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 3099, in fit_transform
    return self._fit(X, y, force_transform=True)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 3126, in _fit
    X = self._scaler.fit_transform(X)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/base.py", line 848, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
    return self.partial_fit(X, y, sample_weight)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
    X = self._validate_data(
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/base.py", line 535, in _validate_data
    X = check_array(X, input_name="X", **check_params)
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/validation.py", line 919, in check_array
    _assert_all_finite(
  File "/Users/leo/mambaforge/envs/tabpfn_package_3.9/lib/python3.9/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input X contains infinity or a value too large for dtype('float64').

Versions

PyTorch version: 2.1.0
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.7.3 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.6)
CMake version: version 3.27.7
Libc version: N/A

Python version: 3.9.21 | packaged by conda-forge | (main, Dec  5 2024, 13:47:18)  [Clang 18.1.8 ] (64-bit runtime)
Python platform: macOS-14.7.3-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3

Dependency Versions:
--------------------
tabpfn: 2.0.3
torch: 2.1.0
numpy: 1.22.4
scipy: 1.10.0
pandas: 2.2.3
scikit-learn: 1.2.0
typing_extensions: 4.12.2
einops: 0.8.0
huggingface-hub: 0.27.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 💣Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions