Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert y to numpy array in _reduce_dimensionality #1326

Merged
merged 2 commits into from
Jul 11, 2023

Conversation

stevetracvc
Copy link
Contributor

When using cuml umap, y needs to be a numpy array instead of a list.

Here's another one I didn't check first before fixing 😆

this will fix #1241

I leave y as a list everywhere, since I saw at least a few type hints that specified y could be a list or a numpy array. I don't know the intention there, so I only convert it to a numpy array in the _reduce_dimensionality function instead.

When using cuml umap, y needs to be a numpy array instead of a list.
@MaartenGr
Copy link
Owner

Thanks for this PR! Could you make sure to cast y as a numpy only if y is not None? I believe that since this differs quite a bit from the fix in the issue you mention, it will also affect many other techniques, like (semi-)supervised, guided, and non-y topic modeling. So something like:

y = np.array(y) if y is not None else None
self.umap_model.fit(embeddings, y=y)

Or, we could just use the example as used in the issue you refer. That way, processing of y is contained to a single function only where it is relevant.

@stevetracvc
Copy link
Contributor Author

So I fixed it in my fork before I saw that discussion, and my thoughts were, _reduce_dimensionality accepts
y: Union[List[int], np.ndarray] = None
so if it's acceptable to pass a list here, then forcing y to an array should occur here. But I just started using this package so I don't know how everything is connected. If you think it's better to move it, let me know.

It passed all tests before I submitted, so I'll see if I can add a test for None here. I didn't actually look to see what tests already exist.

@MaartenGr
Copy link
Owner

Apologies for the late reply. LGTM, thanks!

@MaartenGr MaartenGr merged commit 68de971 into MaartenGr:master Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BERTopic with cuML UMAP: ValueError: Must specify dtype when data is passed as a <class 'list'> exception
2 participants