Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: credit card fraud tutorial notebook update #555

Merged
merged 2 commits into from
Apr 14, 2023

Conversation

axiomofjoy
Copy link
Contributor

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Comment on lines +100 to +104
"## 3. Compute Embeddings\n",
"\n",
"**NOTE: The use of GPUs is recommended for embedding generation. If you are running in Colab, we encourage upgrading to Colab Pro.** \n",
"Run the cell below if you have a CUDA-enabled GPU and want to compute embeddings for your tabular data from scratch; otherwise, skip this step to use the pre-computed embeddings downloaded with the rest of your data in step 2.\n",
"\n",
"The large language models that Arize's embedding generators use have already been trained in such a huge amount of data that the embeddings can capture relevant structure in your data without being fine-tuned."
"`EmbeddingGeneratorForTabularFeatures` represents each row of your DataFrame as a piece of text and computes an embedding for that text using a pre-trained large language model (in this case, \"distilbert-base-uncased\"). For example, if a row of your DataFrame represents a transaction in the state of California from a merchant named \"Leannon Ward\" with a FICO score of 616 and a merchant risk score of 23, `EmbeddingGeneratorForTabularFeatures` computes an embedding for the text: \"The state is CA. The merchant ID is Leannon Ward. The fico score is 616. The merchant risk score is 23...\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure many people will be familiar with generating embeddings for a tabular use-case. maybe add a TLDR and link out to this? https://docs.arize.com/arize/embeddings/embeddings-for-tabular-data-multivariate-drift

Comment on lines +298 to +300
"## 6. Load and View Exported Data\n",
"\n",
"View your most recently exported data as a DataFrame."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context as to why you would export data might help - e.g. contextualize it in the ML Ops lifecycle. I see it's below but it might be worth having a concrete example (finding a cohort that is in production but not training, an under-performing cluster, etc)

@axiomofjoy axiomofjoy merged commit da2c6d6 into main Apr 14, 2023
@axiomofjoy axiomofjoy deleted the credit-card-fraud-tutorial-update branch April 14, 2023 04:54
fjcasti1 pushed a commit that referenced this pull request Apr 18, 2023
* main:
  v0.0.13
  fix: don't compile js/html if exists - unblock conda (#597)
  docs: credit card fraud tutorial notebook update (#555)
  docs: update quickstart notebook (#564)
  don't raise error during dimension type inference (#596)
  fix: Update pyproject.toml (#595)
  chore: change https to http for downloading fixtures and example datasets (#589)
  chore: Use pre commit for prettier and eslint (#588)
  ci: Create .github/dependabot.yml (#587)
  chore: create SECURITY.md (#586)
  chore: legal info (#583)
  fix: ignore non-vectors for embeddings (#584)
  chore: bump to typescript 5 (#585)
  v0.0.12
  feat(embeddings): grid view improvements: sizes, multi-modal output (#565)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants