-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: update quickstart notebook #564
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
"## 2. Download the Data\n", | ||
"\n", | ||
"Download the curated dataset." | ||
"Download production and training image data containing photographs of people performing various actions (sleeping, eating, running, etc.)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might add a note how you would do this on their own infra (like a inference bucket or collect it from the output of their models, or just have data in a dataframe)
"\n", | ||
"## 1. Install Dependencies and Import Libraries" | ||
"Install Phoenix." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would set the stage a bit, add a bit of an intro as to what they. are about to do. E.g. add a TLDR - give them a description of the lifecyle / troubleshooting journey they are embarking on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain how comparing production vs training can identify areas where production has drifted off course.
"- **prediction_id:** a unique identifier for each data point\n", | ||
"- **prediction_ts:** the Unix timestamps of your predictions\n", | ||
"- **url:** a link to the image data\n", | ||
"- **image_vector:** the embedding vectors representing each image\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a brief note of how they would typically get these
"metadata": {}, | ||
"source": [ | ||
"The columns of the DataFrame are:\n", | ||
"- **prediction_id:** a unique identifier for each data point\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain why this is important - correlating data back to other data stores
"id": "jFYdi3vktf4L" | ||
}, | ||
"source": [ | ||
"Navigate to the embeddings page. Select a period of high drift. Select a drifted cluster. Color your data by the `merchant_ID` feature. Select a cluster of drifted production data. Notice that much of this data consists of fraudulent transactions from the Scammeds merchant. Export the cluster." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might help to rephrase this a bit into "how" they should be seeing the drift rather than telling them "what" they should be seeing.
* main: v0.0.13 fix: don't compile js/html if exists - unblock conda (#597) docs: credit card fraud tutorial notebook update (#555) docs: update quickstart notebook (#564) don't raise error during dimension type inference (#596) fix: Update pyproject.toml (#595) chore: change https to http for downloading fixtures and example datasets (#589) chore: Use pre commit for prettier and eslint (#588) ci: Create .github/dependabot.yml (#587) chore: create SECURITY.md (#586) chore: legal info (#583) fix: ignore non-vectors for embeddings (#584) chore: bump to typescript 5 (#585) v0.0.12 feat(embeddings): grid view improvements: sizes, multi-modal output (#565)
No description provided.