Refactor SVD features preprocessing step by bejaeger · Pull Request #768 · PriorLabs/TabPFN

bejaeger · 2026-02-01T13:49:47Z

Refactors SVD feature processing so that it is a separate step in the preprocessing pipeline.

Public API Changes

No Public API changes
Yes, Public API changes (Details below)
The "scaler" option for the preprocessing config global_transformer_name has been removed.

…eature-modality-dict

…PFN into ben/refactor-svd-transform

Copilot

Pull request overview

This pull request refactors SVD feature processing from being embedded in ReshapeFeatureDistributionsStep into a separate AddSVDFeaturesStep, improving the modularity and maintainability of the preprocessing pipeline. The refactoring also removes the "scaler" option from the global_transformer_name configuration.

Changes:

Introduces AddSVDFeaturesStep as a dedicated preprocessing step for SVD feature generation
Extracts utility functions (make_standard_scaler_safe, add_safe_standard_to_safe_power_without_standard) to a new steps/utils.py module
Removes SVD-related functionality from ReshapeFeatureDistributionsStep
Updates pipeline_factory.py to include AddSVDFeaturesStep in the preprocessing pipeline when a global transformer is configured
Removes "scaler" as a valid option for global_transformer_name in PreprocessorConfig

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`src/tabpfn/preprocessing/steps/add_svd_features_step.py`	New step that adds SVD features to the data, extracted from ReshapeFeatureDistributionsStep
`src/tabpfn/preprocessing/steps/utils.py`	New utility module containing shared transformer utility functions
`src/tabpfn/preprocessing/steps/reshape_feature_distribution_step.py`	Removed SVD-related code and global transformer functionality; utility functions moved to utils.py
`src/tabpfn/preprocessing/pipeline_factory.py`	Updated to add AddSVDFeaturesStep to the pipeline when global transformer is configured
`src/tabpfn/preprocessing/configs.py`	Removed "scaler" from valid global_transformer_name options
`tests/test_preprocessing/test_add_svd_features_step.py`	Comprehensive test suite for the new AddSVDFeaturesStep
`tests/test_preprocessing/test_reshape_feature_distribution_step.py`	Removed obsolete tests related to global transformers
`tests/test_preprocessing/test_add_fingerprint_features_step.py`	Added integration test for pipeline usage
`changelog/768.deprecated.md`	Documents removal of "scaler" option

Comments suppressed due to low confidence (2)

src/tabpfn/preprocessing/steps/reshape_feature_distribution_step.py:96

The docstring still mentions global_transformer_name and SVD features (lines 87-88, 91-95) which are no longer handled by this step after the refactoring. These references should be removed since SVD processing is now handled by the separate AddSVDFeaturesStep.

class ReshapeFeatureDistributionsStep(PreprocessingStep):
    """Reshape feature distributions using various transformations.

    This step should receive ALL columns (not modality-sliced) because it:
    1. Handles feature subsampling when too many features exist
    2. Applies different logic based on `apply_to_categorical` flag
    3. Can append transformed features to originals (`append_to_original`)

    # TODO(ben): Add separate PreprocessingStep's for all of the above
    # so that we can register this with modalities

    When using with PreprocessingPipeline, register as a bare step (no modalities):
        pipeline = PreprocessingPipeline(steps=[ReshapeFeatureDistributionsStep()])

    Configuration options:
        - transform_name: The transformation to apply (e.g., "squashing_scaler_default",
            "quantile_uni_coarse")
        - apply_to_categorical: Whether to transform categorical columns too
        - append_to_original: If True, keep original and append transformed as new
            columns
        - max_features_per_estimator: Subsample features if above this threshold
        - global_transformer_name: Optional global transform like "svd" that adds
            features

    Output column ordering:
        - With append_to_original=True: [original_cols, transformed_cols, (svd_cols)]
        - With append_to_original=False, apply_to_categorical=False:
            [categorical_passthrough, numerical_transformed, (svd_cols)]
        - With append_to_original=False, apply_to_categorical=True:
            [all_transformed, (svd_cols)]
    """

src/tabpfn/preprocessing/steps/reshape_feature_distribution_step.py:260

The comment mentions SVD features but SVD processing has been moved to AddSVDFeaturesStep. This comment should be removed as it's no longer accurate after the refactoring.

        # Build the new metadata with updated categorical indices
        # Non-categorical indices become numerical
        # SVD features are numerical and appended at the end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/tabpfn/preprocessing/pipeline_factory.py

src/tabpfn/preprocessing/steps/reshape_feature_distribution_step.py

src/tabpfn/preprocessing/steps/add_svd_features_step.py

src/tabpfn/preprocessing/steps/reshape_feature_distribution_step.py

src/tabpfn/preprocessing/pipeline_factory.py

tests/test_preprocessing/test_add_svd_features_step.py

alanprior · 2026-02-05T23:59:43Z

src/tabpfn/preprocessing/steps/add_svd_features_step.py

+
+
+class AddSVDFeaturesStep(PreprocessingStep):
+    """Adds SVD features to the data."""


@bejaeger can we elaborate slightly more the docstrings? What we do here is add, on top of the raw X, also more features that are just a compressed version of them? If so - I get it for numerical features, but a bit weird for categorical, and non-applicable for text?

alanprior · 2026-02-06T00:03:28Z

src/tabpfn/preprocessing/steps/add_svd_features_step.py

+            (
+                "save_standard",
+                make_standard_scaler_safe(
+                    ("standard", StandardScaler(with_mean=False)),


@bejaeger at this step we expect many columns to already be normalized, right? and this is general is to learn a more balanced projection, but unrelated to the original features?

alanprior · 2026-02-06T00:04:14Z

src/tabpfn/preprocessing/steps/add_svd_features_step.py

+            n_samples,
+            n_features,
+        )
+        return next(


@bejaeger I find this syntax hard to read. Why do we have next here? I'm confused

alanprior

GREAT!

alanprior · 2026-02-06T00:07:52Z

src/tabpfn/preprocessing/configs.py

    max_features_per_estimator: int = 500
    global_transformer_name: (
        Literal[
-            "scaler",


What did scaler mean here?

bejaeger added 30 commits January 29, 2026 10:31

tweak

06af2a0

add reference predictions

d248833

use np.load

4f71386

Merge branch 'main' into ben/add-pipeline-consistency-test

fb647fa

200 -> 100 samples

39e632e

100 -> 50 samples

a6044ab

Introduce feature modalities, add TabPFNLabelEncoder

f3e3ee3

use columnmetadata

ec14a6f

update pipeline and add pipeline consistency tests

c579f38

clean up consistency tests

00c481e

update consistency tests

a336463

update references

fd44cb3

Merge branch 'ben/add-pipeline-consistency-test' into ben/introduce-f…

f0a81c9

…eature-modality-dict

skip test when not backwards compatible

a2ec121

Merge branch 'main' into ben/introduce-feature-modality-dict

d55bf8a

cleanup

36d3d2c

more cleanup

59af1f2

rename ensemble classes

07c2dcd

rename file

ae216ba

rename and cleanup

3a5816e

feature modalities -> feature metadata

f845980

feature metadata -> feature schema

280a1de

cleanup

62114bd

Merge branch 'main' into ben/introduce-feature-modality-dict

ad853c1

remove old test file

57e5eef

add back sklearn compatible error

20687cf

fix fit_transform in kid transform

712f1be

have kdi test for fit and fit_transform

1b1b81e

fix

08be623

cleanup and add changelog

f55689d

bejaeger removed the request for review from klemens-floege February 3, 2026 15:14

bejaeger added 10 commits February 3, 2026 16:25

tweak

e5a2ed8

revision

650ff00

fix test

e731404

revision 2

a6103e4

Squash feature branch commits

dbfaef3

fix issue from merging

af39010

add changelog

b8f7a1c

revision

17a7f07

tweak

0b96218

Merge branch 'ben/refactor-svd-transform' of github.com:PriorLabs/Tab…

53b9941

…PFN into ben/refactor-svd-transform

bejaeger changed the base branch from ben/introduce-feature-modality-dict to main February 5, 2026 09:54

bejaeger added 2 commits February 5, 2026 11:00

Make SVD preprocessing a separate step

6d94a4a

Merge branch 'ben/refactor-svd-transform' of github.com:PriorLabs/Tab…

a5c9e40

…PFN into ben/refactor-svd-transform

bejaeger changed the base branch from main to lg-fix-nans-categorical February 5, 2026 10:06

bejaeger changed the base branch from lg-fix-nans-categorical to main February 5, 2026 10:07

Merge branch 'main' into ben/refactor-svd-transform

4569471

Copilot AI review requested due to automatic review settings February 5, 2026 10:10

Copilot started reviewing on behalf of bejaeger February 5, 2026 10:10 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

bejaeger added 2 commits February 5, 2026 11:15

add changelog and cleanup tests

5212211

revision

d3f7ecf

bejaeger requested review from alanprior February 5, 2026 11:39

alanprior reviewed Feb 5, 2026

View reviewed changes

alanprior reviewed Feb 6, 2026

View reviewed changes

alanprior approved these changes Feb 6, 2026

View reviewed changes

alanprior reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor SVD features preprocessing step#768

Refactor SVD features preprocessing step#768
bejaeger wants to merge 57 commits intomainfrom
ben/refactor-svd-transform

bejaeger commented Feb 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alanprior Feb 5, 2026

Uh oh!

alanprior Feb 6, 2026

Uh oh!

alanprior Feb 6, 2026

Uh oh!

alanprior left a comment

Uh oh!

alanprior Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		class AddSVDFeaturesStep(PreprocessingStep):
		"""Adds SVD features to the data."""

Conversation

bejaeger commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Public API Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alanprior Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

alanprior Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

alanprior Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

alanprior left a comment

Choose a reason for hiding this comment

Uh oh!

alanprior Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bejaeger commented Feb 1, 2026 •

edited

Loading