Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing kNN distances or graph as input? #6

Open
davisidarta opened this issue Jan 6, 2022 · 2 comments
Open

Passing kNN distances or graph as input? #6

davisidarta opened this issue Jan 6, 2022 · 2 comments

Comments

@davisidarta
Copy link

davisidarta commented Jan 6, 2022

HI! Thank you for this fast and powerful package. Its concepts towards optimization are novel to DR and I really enjoyed your paper.

I have a question: is it possible to pass pre-computed kNN distances (or the affinity or adjecency graphs) as input to NCVis?

For now I'm testing it with a small dataset (it's indeed blazing fast) but will soon advance to one of around 1.3M samples x 5k observations for which I already have precomputed affinities. While I believe it will have no trouble computing distances rather rapidly, I can also foresee several situations where users may want to embed distance matrices, such as in chemistry, NLP, and bioinformatics, so the ability to obtain visualizations from these would be really great.

Edit: I'm aware this is a completely different question, but feel like should not open an entirely new issue just for it: I just noticed the package seems to not support user-provided initializations, and instead always employs some optimization from a random projection. Would that power iteration approach work on user-provided initializations?

@alartum
Copy link
Member

alartum commented Jan 10, 2022

Hi, and thank you for posting the issue! :) TL;DR I propose to add support of adjacency graphs as input and to add a default utility function for its construction from the pre-computed distance matrix.

First of all, regarding the usage of pre-computed distances -- actually, it has already been requested several times, but you're the first to do it in the written form. I guess it is the moment for me to implement it finally :)

Could you please provide your view on the most convenient user interface? Below are my thoughts on this topic:

  • NCVis uses hnswlib under the hood, and it achieves a massive speed up during the distance computation. The problem here is the package does not support index construction from the pre-computed distances.
  • it is sufficient to provide the kNN matrix for NCVis to work fine (even the initialization will still behave nicely). I believe a more flexible solution will be to add support of adjacency graphs as input and a default utility function for its construction from the pre-computed distance matrix (using nmslib, for example). In this case, you can use the tool of your choice for the kNN part and still use pre-computed distances with the default index utility.

Second, regarding your question about the initialization. I believe I've implicitly answered it -- it is ok to use the proposed initialization with user-provided distances :)

@jnboehm
Copy link

jnboehm commented Dec 27, 2024

Hi, just to chime in (almost three years later): I think it would be nice to expose the optimize function to the python side. That way it would be possible to pass in the kNN graph (or any list of edges, for that matter) as well as a custom initialization. I suppose it's not really possible to pass in a distance matrix to hnsw since it works with datasets that are too large to fit in memory. So I think it's easier to simply construct the kNN graph on the python side and pass that in to NCVis.

Potentially it would also be nice to be able to call the initialization function. Though since it behaves similar to a spectral initialization, it would be more interesting to also compute the initial points on the python side and pass that in to the call to optimize.

The only thing I am not so sure about is how to handle the parameter Q, as it would be nice to have access to it after the optimization is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants