-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate latentscope into Vector-io #29
Comments
Oh no, I think I know what the issue is here. Could you go to the |
|
ah thank you, I see the issue. I'm working on a fix and will release a patch shortly |
I've released 0.1.2 which fixes the issue you encountered. Do you mind upgrading and trying again? |
As a first step towards supporting a more direct integration I made a function in it looks like the VDF file format would have all the parameters you'd need to call this function too |
When I run the notebook, I'm getting:
even after updating to 0.1.2 |
When I re-run via UI, I run into #30 again |
Sorry, I didn't include the import_embeddings in v0.1.2 so I just pushed v0.1.3 which should add the function. |
That line works in the python notebook now. When I try to load the file in the Web UI, the UI crashes (without any error logs on server side). |
yeah I'm planning to make the web UI much smarter, and checking for same
dataset name should be high on the list.
in order to use the python line you need to load the dataset already,
either via web UI or via python interface so you shouldn't try and upload
the dataset via the web UI after you've loaded the embeddings. the notebook
example shows the python ingest process. After that you can load the web UI
and go to the setup page for the dataset to run umap and clusters.
internally latentscope puts everything related to a dataset in a single
folder. there can be multiple embeddings for one dataset, and multiple
umaps for each embedding etc.
…On Tue, Mar 5, 2024, 12:44 PM Dhruv Anand ***@***.***> wrote:
There is some confusion on my setup, since I have imported files with the
same name multiple times, and it seems to open the same scope for them.
Would be good to create a new scope for each new parquet file loaded in via
UI
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAXPPNWW3MPLX5OEHQBJMLYWYABBAVCNFSM6AAAAABEEFQGQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGMYDSNJUGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @enjalot,
I'm working on a project called Vector-io https://github.com/AI-Northstar-Tech/vector-io, which allows people to port over their vector datasets across various vector DBs and store snapshots on disk in a simple format called VDF (parquet files and a metadata json file).
I would love to integrate latentscope as a way to visualize the vectors that people have stored in their dataset.
I'm linking to the issue I have in my repo for the integration: AI-Northstar-Tech/vector-io#61.
I wanted to start by asking for help on a bug that I faced while using the web UI to load data from a parquet file in an example dataset I have: https://huggingface.co/datasets/aintech/vdf_20240125_130746_ac5a6_medium_articles/blob/main/medium_articles/medium_articles_2.parquet
I was able to complete the embedding step (though I plan to integrate into the new functionality you're planning for allowing people to use existing vectors), but for the clustering step I got this error:
The text was updated successfully, but these errors were encountered: