-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error message when passing a df to docs #1589
Comments
Technically, there is not type checking on the input but only type hinting. The difference, in part, lies with some of the design philosophies of Python where typing is not enforced.
Sure, I would have to check for the best implementation though. I do not think stricter type checking is the solution here since if I were to start doing that here, shouldn't I do it for all other variables? However, allowing for a pandas series to be passed should be possible. An entire pandas dataframe is a different story though as that would also need to involve checking the columns to use which can open up some problems. |
Ah thanks, I hadn't actually realised that distinction.
I see your point. Maybe. But I think most other variables are pretty specific too BERTopic (i.e. a user would create them specifically to pass to a BERTopic function), so they are more likely to be in the right format from the beginning.
Certainly not suggesting you allow a dataframe. Agree, way too complicated. |
Many methods like
visualize_documents
take adocs
argument which should be a list, but most of the time my documents are stored in apd.dataframe
because there is other metadata associated with them and I often inadvertently end up passing the data frame to these methods rather than the list. Even though there is type checking on the input:https://github.com/MaartenGr/BERTopic/blob/62e97ddea6cdcf9e4da25f9eaed478b22a9f9e20/bertopic/plotting/_documents.py#L9C1-L11C50
It doesn't throw a helpful error (e.g. 'docs should be type List'), instead it throws a KeyError with a random number, which often takes an embarrassing amount of time trying to debug before I remember that I've been here before and know what I've done wrong.
Is there any chance that all of these methods could either do some stricter type checking or check for a dataframe on that input?
The text was updated successfully, but these errors were encountered: