Conversation
Fixes issue #138: NA handling in text columns - Add skrub>=0.3.0 dependency to handle mixed string/NA data - Integrate TableVectorizer in TabPFNClassifier to properly process text columns with NA values - Add test to verify the solution works as expected
|
Okay we encountered problem, skrub 0.3.0 requires scipy 1.9.3 which isn't compatible with TabPFN |
|
Does it fail without |
|
I've simplified the implementation to only rely on TableVectorizer without needing the extra function. Also bumped scikit-learn minimum version to 1.2.1 for compatibility with skrub. Note that scikit-learn 1.2.1 was released in January 2023, so it's still more than 2 years old and should be a reasonable dependency. Same for pandas 1.5.3. |
|
Instead we could use Autogluon AutoMLPipelineFeatureGenerator? |
Fix #138: NA handling in text columns
Fix #163
Partially fixed by #242
Summary
Test plan