-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Embeddings Template #67
Conversation
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Sample usage: | ||
``` | ||
# Generate emeddings for a sample single-file parquet file containing 10 rows from the `falcon-refinedweb` HuggingFace dataset, using 1 GPU. | ||
python ray_embeddings.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the python file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the docker image used for the cluster environment. We want to keep this code private, so it is not included in this public repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. will that file be in default
directory?
If not can you create a wrapper python file here which takes params and invokes proprietary script in docker image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the relevant files (python file, ipynb, sample files, etc) are included in the ~/default
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks for confirming
Adds README and compute config for a new workspace template for computing text embeddings.
With https://github.com/anyscale/product/pull/26521