Replace spaces with underscores in column names also for the predict function #690

MilesCranmer · 2024-08-02T16:35:52Z

Discussed in #689

^{Originally posted by @GoldenGoldy August 3, 2024}
I found that PySR warns about spaces in column names when passing the .fit function data where this occurs. It then replaces the spaces in the column names with underscores and prints a warning about this. You can then proceed with fitting the data as per normal.
When later calling the .predict function, this does not attempt to make the same replacement of spaces with underscores in the column names.
So, if we have a fitted model and want to use it to make predictions, and we pass data to the .predict function in the same format that we used for the .fit function, we can run into the following issue:
The predict function (in sr.py) contains the following code line "X = X.reindex(columns=self.feature_names_in_)". This results in NaN values in case the column names have spaces, because now it tries to match the column names (with spaces) with the feature names of the model, but in the latter the spaces were replaced by underscores.
We then get the somewhat confusing message "ValueError: Input X contains NaN.", which leads one to believe that there are NaN values in the data even while there are none, they only get introduced by the reindex which can't match the column names.

All this can be avoided of course, once you are aware of the problem and avoid using spaces in the column names from the beginning. However, it might be more consistent, and allow for a better user experience, if the .predict function also replaces spaces in the column names with underscores?

MilesCranmer added the bug Something isn't working label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace spaces with underscores in column names also for the predict function #690

Replace spaces with underscores in column names also for the predict function #690

MilesCranmer commented Aug 2, 2024

Replace spaces with underscores in column names also for the predict function #690

Replace spaces with underscores in column names also for the predict function #690

Comments

MilesCranmer commented Aug 2, 2024

Discussed in #689