-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I think the code for reproducing the label encoders may be missing from the repository. Perhaps we discussed this before, but I forgot why this is.
I ran into this pickled LabelEncoder version mismatch warning again because the pinned scikit-learn version (0.22.1) wouldn't install on a newer Python version in a new environment I had created. It's a bit annoying anyway, maintenance-wise, to have to leave the scikit-learn dependency pinned. Just using the newest available version typically is more convenient. However, if we don't pin, we keep getting that annoying warning.
How much time does it cost to generate a label_encoders.pkl file? And what data is necessary for it?
If it is not too costly, then an alternative to shipping it with the package would be to just generate it on first usage. That would also take care of the warning automatically, except when people update scikit-learn.
To make things totally airtight, we could have the encoders be regenerated then as well. We could set the warning to throw as an exception, catch that and then regenerate.
Overengineering? Probably :D Not too much effort, though. At least, if the label_encoders generation process is not too complicated.