Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace spaces with underscores in column names also for the predict function #690

Open
MilesCranmer opened this issue Aug 2, 2024 Discussed in #689 · 0 comments
Open
Labels
bug Something isn't working

Comments

@MilesCranmer
Copy link
Owner

Discussed in #689

Originally posted by @GoldenGoldy August 3, 2024
I found that PySR warns about spaces in column names when passing the .fit function data where this occurs. It then replaces the spaces in the column names with underscores and prints a warning about this. You can then proceed with fitting the data as per normal.
When later calling the .predict function, this does not attempt to make the same replacement of spaces with underscores in the column names.
So, if we have a fitted model and want to use it to make predictions, and we pass data to the .predict function in the same format that we used for the .fit function, we can run into the following issue:
The predict function (in sr.py) contains the following code line "X = X.reindex(columns=self.feature_names_in_)". This results in NaN values in case the column names have spaces, because now it tries to match the column names (with spaces) with the feature names of the model, but in the latter the spaces were replaced by underscores.
We then get the somewhat confusing message "ValueError: Input X contains NaN.", which leads one to believe that there are NaN values in the data even while there are none, they only get introduced by the reindex which can't match the column names.

All this can be avoided of course, once you are aware of the problem and avoid using spaces in the column names from the beginning. However, it might be more consistent, and allow for a better user experience, if the .predict function also replaces spaces in the column names with underscores?

@MilesCranmer MilesCranmer added the bug Something isn't working label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant