Skip to content

Commit

Permalink
LLM Documentation (#660)
Browse files Browse the repository at this point in the history
* adding the new doc file

* LLM extraction documentation
  • Loading branch information
mkaramlou authored Aug 13, 2024
1 parent 814e20c commit 038412c
Show file tree
Hide file tree
Showing 5 changed files with 87 additions and 0 deletions.
3 changes: 3 additions & 0 deletions docs/assets/images/prompt-extraction-access-dark.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/assets/images/prompt-extraction-access-light.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions docs/dataset/advanced-usage/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,9 @@ This section contains tutorial documentation for advanced features.
---
Run model comparisons programmatically and add model improvements as requirements into your CI pipelines.

- [:kolena-diarization-workflow-16: LLM Powered Data Processing](./llm-prompt-extraction.md)

---
Leverage the power of LLMs to extract custom properties from your data or setup evaluation on your model results

</div>
75 changes: 75 additions & 0 deletions docs/dataset/advanced-usage/llm-prompt-extraction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
icon: kolena/diarization-workflow-16

---
# :kolena-diarization-workflow-20: LLM Powered Data Processing

Prompt extractions using LLMs enable you to perform powerful data processing activities on Kolena and
use the LLM outputs in variety of ways across the platform. We will go over the details in this document.

!!! Info
Kolena is using on-prem models that do not require export of any customer data
outside of data services used to run the application.

## Configuration

LLM powered property extraction can be configured from the Dataset Details page. Scroll to he bottom to create new
extractions or see existing ones.

<figure markdown>
![Access LLM Prompt Extraction](../../assets/images/prompt-extraction-access-dark.gif#only-dark)
![Access LLM Prompt Extraction](../../assets/images/prompt-extraction-access-light.gif#only-light)
<figcaption>Access LLM Prompt Extraction</figcaption>
</figure>

Prompt based extractions consist of three main components.

**Field Name**: a unique name that will be used to reference the extractions across the platform

**Model**: the model you wish to use to execute the prompt on. Currently Kolena supports
`Llamma 3.1-8b`, `Llamma 3.1-70b`, `Llamma 3.1-405b`, `Llamma 3-8b`, `Llamma 3-70b` and `Mixtral-8x7B`

**Prompt**: instructions on you wish the LLM to execute.
Using the `@` sign you can reference dynamic fields from dataset and model results in your prompt.

!!! note
If a prompt has only references to the dataset fields, it is represented as a property of your dataset and can be
used to create test cases.

If a prompt had references to results and (or) datasets, it is considered a property of the results and can be used to
create metrics.

!!! note
When working on a new prompt, use the `Try it out` button on the prompt configurator to see a sample of 50
extractions and tune your prompts accordingly.

!!! Example
**1 - Translations**

Imagine working on a dataset with text fields in a language that you are not familiar with. Using the LLM based
extractions you can translate text fields into a language you are familiar with. An example prompt would be:

`Please translate the following text to English: @datapoint.mandarin-text`

**2 - Categories**

You can use the prompt based extractions to create categories of large texts that can be used in your analysis
of model performance. For example the following prompt generates categories base don article summaries that can
be used to setup Test cases on Kolena:

`Provide a category name for the following article summary:
@datapoint.text_summary
If you are not sure what the category is, response with "Unknown". Do NOT include additional information in your response.
Do not use upper case in your response.`

**3 - Evaluations**

You can use this feature to evaluate your text based model results. For example if you are using an LLM for
summarization tasks and want to evaluate multiple models, you can use the following prompt to assign a score
to each model and use that score to create a metric for model evaluation:

`Given the following article details and article summary, provide a score of 1 to 5 on completeness of the summary.
Do NOT provide the reasoning in your response. Do not include any additional information besides the score. If the
article summary is missing, respond with "unknown".
Article details:@datapoint.article-details
Article summary:@result.article-details`
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ nav:
- Automatically Extract Text Properties: automations/extract-text-metadata.md
- Automatically Extract Image Properties: automations/extract-image-metadata.md
- Setting Up Natural Language Search: automations/set-up-natural-language-search.md
- LLM Powered Data Processing: dataset/advanced-usage/llm-prompt-extraction.md
- Connecting Cloud Storage:
- Connecting Cloud Storage: connecting-cloud-storage/index.md
- Amazon S3: connecting-cloud-storage/amazon-s3.md
Expand Down

0 comments on commit 038412c

Please sign in to comment.