-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: credit card fraud tutorial notebook update #555
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
"## 3. Compute Embeddings\n", | ||
"\n", | ||
"**NOTE: The use of GPUs is recommended for embedding generation. If you are running in Colab, we encourage upgrading to Colab Pro.** \n", | ||
"Run the cell below if you have a CUDA-enabled GPU and want to compute embeddings for your tabular data from scratch; otherwise, skip this step to use the pre-computed embeddings downloaded with the rest of your data in step 2.\n", | ||
"\n", | ||
"The large language models that Arize's embedding generators use have already been trained in such a huge amount of data that the embeddings can capture relevant structure in your data without being fine-tuned." | ||
"`EmbeddingGeneratorForTabularFeatures` represents each row of your DataFrame as a piece of text and computes an embedding for that text using a pre-trained large language model (in this case, \"distilbert-base-uncased\"). For example, if a row of your DataFrame represents a transaction in the state of California from a merchant named \"Leannon Ward\" with a FICO score of 616 and a merchant risk score of 23, `EmbeddingGeneratorForTabularFeatures` computes an embedding for the text: \"The state is CA. The merchant ID is Leannon Ward. The fico score is 616. The merchant risk score is 23...\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure many people will be familiar with generating embeddings for a tabular use-case. maybe add a TLDR and link out to this? https://docs.arize.com/arize/embeddings/embeddings-for-tabular-data-multivariate-drift
"## 6. Load and View Exported Data\n", | ||
"\n", | ||
"View your most recently exported data as a DataFrame." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some context as to why you would export data might help - e.g. contextualize it in the ML Ops lifecycle. I see it's below but it might be worth having a concrete example (finding a cohort that is in production but not training, an under-performing cluster, etc)
* main: v0.0.13 fix: don't compile js/html if exists - unblock conda (#597) docs: credit card fraud tutorial notebook update (#555) docs: update quickstart notebook (#564) don't raise error during dimension type inference (#596) fix: Update pyproject.toml (#595) chore: change https to http for downloading fixtures and example datasets (#589) chore: Use pre commit for prettier and eslint (#588) ci: Create .github/dependabot.yml (#587) chore: create SECURITY.md (#586) chore: legal info (#583) fix: ignore non-vectors for embeddings (#584) chore: bump to typescript 5 (#585) v0.0.12 feat(embeddings): grid view improvements: sizes, multi-modal output (#565)
No description provided.