-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hold] Platform UI: Embed and chunk recommender #408
Draft
Paul-Cornell
wants to merge
3
commits into
main
Choose a base branch
from
DOCS-82-recommend-2.0
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -508,6 +508,7 @@ | |
}, | ||
"platform/workflows", | ||
"platform/jobs", | ||
"platform/recommend", | ||
{ | ||
"group": "API", | ||
"pages": [ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -2,6 +2,12 @@ | |||||||
title: Embedding | ||||||||
--- | ||||||||
|
||||||||
<Tip> | ||||||||
To get help choosing and setting an embedding provider and model for your | ||||||||
[Custom](/platform/workflows#create-a-custom-workflow) workflows, | ||||||||
[request an embed recommendation from Unstructured](/platform/recommend). | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
</Tip> | ||||||||
|
||||||||
After partitioning, chunking, and summarizing, the _embedding_ step creates arrays of numbers | ||||||||
known as _vectors_, representing the text that is extracted by Unstructured. | ||||||||
These vectors are stored or _embedded_ next to the text itself. These vector embeddings are generated by an | ||||||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
title: Recommend | ||
--- | ||
|
||
The Unstructured Platform can offer you recommendations for an ideal [embedding provider and model](/platform/embedding) | ||
and [chunking strategy and settings](/platform/chunking) for your source files. | ||
These recommendations are optimized to work well across a variety of | ||
vector stores, RAG applications, and model fine-tuning scenarios. | ||
|
||
Unstructured's embedding and chunking recommendations are especially useful if you are not familiar with how the | ||
various embedding and chunking strategies and settings can be applied for optimal results. However, if you are already comfortable with embedding and chunking, | ||
these recommendations can still be useful in helping inform your current strategies. | ||
|
||
Unstructured's recommendations can be implemented only in **Build it with me > Custom** and **Build it myself** [workflows](/platform/workflows). | ||
You cannot implement these recommendations in **Build it with me** > **Basic**, **Advanced**, and **Platinum** workflows, as those workflow types already have | ||
preset embedding and chunking settings that cannot be changed. | ||
|
||
Unstructured makes its recommendations by using the specified [source connector](/platform/sources/overview) to access, process, and analyze a | ||
sampling of files from the source location. Unstructured then recommends | ||
an embedding provider and model and a chunking strategy and settings based on this analysis. | ||
|
||
Unstructured's embedding and chunking recommendations can be requested for the following file-based source connector types: | ||
|
||
- [Azure](/platform/sources/azure-blob-storage) | ||
- [Dropbox](/platform/sources/dropbox) | ||
- [Google Cloud Storage](/platform/sources/google-cloud) | ||
- [S3](/platform/sources/s3) | ||
|
||
import SharedPagesBilling from '/snippets/general-shared-text/pages-billing.mdx'; | ||
|
||
<Note> | ||
Performing a recommendation will result in billing to your Unstructured account. To make its recommendation, Unstructured | ||
must process and analyze a sampling of up to 50 files from the source location. | ||
Your Unstructured account is billed for the equivalent number of pages. | ||
|
||
<SharedPagesBilling /> | ||
</Note> | ||
|
||
## Request a recommendation | ||
|
||
1. In the Unstructured Platform, on the sidebar, click **Connectors**. | ||
2. Click **Sources**. | ||
3. Click the name of the source connector that you want to use. If you do not have a source connector, | ||
[create one](/platform/sources/overview). | ||
4. If you're requesting a recomendation for the first time for this connector, click the **Run Recommender** button. | ||
|
||
If you have previously requested a recommendation for this connector, you can make another request by clicking the **Run Again** button. | ||
This is useful if you significantly changed the files in the source location since you previously | ||
requested a recommendation. | ||
|
||
If the **Run Recommender** or **Run Again** button is not visible, or if they are visible but not enabled, check for the following: | ||
|
||
- The selected connector must be a file-based source connector. See the preceding list for supported file-based source connector types. | ||
- The selected connector must have successfully passed a connectivity test. If the connector's details pane does not show a | ||
**Successful** icon, then click the pencil icon, make any necessary changes to the connector's previous settings, | ||
and then click **Save and Test**. | ||
|
||
5. Two **Scheduled** statuses appear, one for **Embed** and another for **Chunk**. | ||
6. After several minutes, the **Scheduled** statuses are replaced by **Running**. | ||
7. After several more minutes, the **Running** statuses are replaced by **Finished**. | ||
8. After **Finished** appears, to view the recommendation, click **View**. | ||
|
||
The **Auto Recommender Results** pane shows Unstructured's recommended embedding provider and model and chunking strategy and | ||
settings for the source files that it analyzed. | ||
|
||
## Implement an embed recommendation | ||
|
||
1. In the **Auto Recommender Results** pane, in the **Embed Recommendation** area, note the recommended embedding provider and model. | ||
2. To implement the recommendation, expand the **Next Steps** section and follow the on-screen instructions for your target workflow. | ||
|
||
## Implement a chunking recommendation | ||
|
||
1. In the **Auto Recommender Results** pane, in the **Chunk Recommendation** area, note the recommended chunking strategy and settings. | ||
2. To implement the recommendation, expand the **Next Steps** section and follow the on-screen instructions for your target workflow. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.