diff --git a/docs/assets/dashboard/dashboard_home.png b/docs/assets/dashboard/dashboard_home.png new file mode 100644 index 00000000..8bce7540 Binary files /dev/null and b/docs/assets/dashboard/dashboard_home.png differ diff --git a/docs/assets/dashboard/dashboard_project1.png b/docs/assets/dashboard/dashboard_project1.png new file mode 100644 index 00000000..db4705a6 Binary files /dev/null and b/docs/assets/dashboard/dashboard_project1.png differ diff --git a/docs/assets/dashboard/eval.png b/docs/assets/dashboard/eval.png new file mode 100644 index 00000000..4079251e Binary files /dev/null and b/docs/assets/dashboard/eval.png differ diff --git a/docs/assets/dashboard/eval_logs.png b/docs/assets/dashboard/eval_logs.png new file mode 100644 index 00000000..bdeddd83 Binary files /dev/null and b/docs/assets/dashboard/eval_logs.png differ diff --git a/docs/assets/dashboard/eval_select_metrics.png b/docs/assets/dashboard/eval_select_metrics.png new file mode 100644 index 00000000..9e47a266 Binary files /dev/null and b/docs/assets/dashboard/eval_select_metrics.png differ diff --git a/docs/assets/dashboard/prompt.png b/docs/assets/dashboard/prompt.png new file mode 100644 index 00000000..5ae0bd85 Binary files /dev/null and b/docs/assets/dashboard/prompt.png differ diff --git a/docs/assets/dashboard/prompt_select.png b/docs/assets/dashboard/prompt_select.png new file mode 100644 index 00000000..2b9141f8 Binary files /dev/null and b/docs/assets/dashboard/prompt_select.png differ diff --git a/docs/dashboard/evaluations.mdx b/docs/dashboard/evaluations.mdx new file mode 100644 index 00000000..8dddf580 --- /dev/null +++ b/docs/dashboard/evaluations.mdx @@ -0,0 +1,69 @@ +--- +title: Evaluations +--- + +### What are Evaluations? + +Using UpTrain you can run evaluations on 20+ pre-configured metrics like: +1. [Context Relevance](/predefined-evaluations/context-awareness/context-relevance): Evaluates how relevant the retrieved context is to the question specified. + +2. [Factual Accuracy](/predefined-evaluations/context-awareness/factual-accuracy): Evaluates whether the response generated is factually correct and grounded by the provided context. + +3. [Response Completeness](/predefined-evaluations/response-quality/response-completeness): Evaluates whether the response has answered all the aspects of the question specified + +You can look at the complete list of UpTrain's supported metrics [here](/predefined-evaluations/overview) + +### How does it work? + + + + Click on `Create New Project` from Home + + + + + + + + + * `Project name:` Create a name for your project + * `Dataset name:` Create a name for your dataset + * `Project Type:` Select project type: `Evaluations` + * `Choose File:` Upload your Dataset + Sample Dataset: + ```jsonl + {"question":"","response":"","context":""} + {"question":"","response":"","context":""} + ``` + * `Evaluation LLM:` Select an LLM to run evaluations + + + + + + + + You can see all the evaluations ran using UpTrain + + + + + You can also see individual logs + + + + + + + + + Join our community for any questions or requests + + + + diff --git a/docs/dashboard/getting_started.mdx b/docs/dashboard/getting_started.mdx new file mode 100644 index 00000000..72efdee7 --- /dev/null +++ b/docs/dashboard/getting_started.mdx @@ -0,0 +1,38 @@ +--- +title: Getting Started +--- + +### What is UpTrain Dashboard? + +The UpTrain dashboard is a web-based interface that allows you to evaluate your LLM applications. + +It is a self-hosted dashboard that runs on your local machine. You don't need to write any code to use the dashboard. + +You can use the dashboard to evaluate your LLM applications, view the results, manage prompts, run experiments, and perform root cause analysis. + +Before you start, ensure you have docker installed on your machine. If not, you can install it from [here](https://docs.docker.com/get-docker/). + +### How to install? + +The following commands will download the UpTrain dashboard and start it on your local machine: +```bash +# Clone the repository +git clone https://github.com/uptrain-ai/uptrain +cd uptrain + +# Run UpTrain +bash run_uptrain.sh +``` + + + + Join our community for any questions or requests + + + + diff --git a/docs/dashboard/project.mdx b/docs/dashboard/project.mdx new file mode 100644 index 00000000..b4c89eb5 --- /dev/null +++ b/docs/dashboard/project.mdx @@ -0,0 +1,52 @@ +--- +title: Create a Project +--- + +### What is a Project? + +Using UpTrain Dashboard you can manage all your projects. + +There are 2 types of projects we support: + * **[Evaluations](/dashboard/evaluations):** Run evaluations on your queries, documents and LLM responses + * **[Prompts](/dashboard/prompts):** Find the best way to ask questions to your LLM using prompt iteration, experimentation and evaluations + +### How does it work? + + + + Click on `Create New Project` from Home + + + + + + * `Project name:` Create a name for your project + * `Dataset name:` Create a name for your dataset + * `Project Type:` Select a project type between `Evaluations` and `Prompts` + * `Choose File:` Upload your Dataset + Sample Dataset: + ```jsonl + {"question":"", "response":"", "context":""} + {"question":"", "response":"", "context":""} + ``` + * `Evaluation LLM:` Select an LLM to run evaluations + + + + + + +Now that you have created a project, you can run evaluations or experiment with prompts + + + + Join our community for any questions or requests + + + + diff --git a/docs/dashboard/prompts.mdx b/docs/dashboard/prompts.mdx new file mode 100644 index 00000000..e8baacf6 --- /dev/null +++ b/docs/dashboard/prompts.mdx @@ -0,0 +1,69 @@ +--- +title: Prompts +--- + +### What are Prompts? + +You can manage your prompt iterations and experiment with them using UpTrain on 20+ pre-configured evaluation metrics like: +1. [Context Relevance](/predefined-evaluations/context-awareness/context-relevance): Evaluates how relevant the retrieved context is to the question specified. + +2. [Factual Accuracy](/predefined-evaluations/context-awareness/factual-accuracy): Evaluates whether the response generated is factually correct and grounded by the provided context. + +3. [Response Completeness](/predefined-evaluations/response-quality/response-completeness): Evaluates whether the response has answered all the aspects of the question specified + +You can look at the complete list of UpTrain's supported metrics [here](/predefined-evaluations/overview) + +### How does it work? + + + + Click on `Create New Project` from Home + + + + + + + + + * `Project name:` Create a name for your project + * `Dataset name:` Create a name for your dataset + * `Project Type:` Select project type: `Prompts` + * `Choose File:` Upload your Dataset + Sample Dataset: + ```jsonl + {"question":"","response":"","context":""} + {"question":"","response":"","context":""} + ``` + * `Evaluation LLM:` Select an LLM to run evaluations + + + + + + + + + + + + + You can see all the evaluations ran on your prompts using UpTrain + + + + + + + + + Join our community for any questions or requests + + + + diff --git a/docs/llms/anyscale.mdx b/docs/llms/anyscale.mdx index 9866c0fc..44c05485 100644 --- a/docs/llms/anyscale.mdx +++ b/docs/llms/anyscale.mdx @@ -23,6 +23,11 @@ ANYSCALE_API_KEY = "esecret_***********************" settings = Settings(model='anyscale/mistralai/Mistral-7B-Instruct-v0.1', anyscale_api_key=ANYSCALE_API_KEY) ``` + + The model name should start with `anyscale/` for UpTrain to recognize you are using models hosted on Anyscale. + + For example if you are using `mistralai/Mistral-7B-Instruct-v0.1` via Anyscale, the model name should be `anyscale/mistralai/Mistral-7B-Instruct-v0.1` + We have used Mistral-7B-Instruct-v0.1 for this example. You can find a full list of available models [here](https://docs.endpoints.anyscale.com/category/supported-models). diff --git a/docs/llms/azure.mdx b/docs/llms/azure.mdx index 073894b8..fefd2de9 100644 --- a/docs/llms/azure.mdx +++ b/docs/llms/azure.mdx @@ -43,6 +43,11 @@ You can use your Azure API key to run LLM evaluations using UpTrain. settings = Settings(model = 'azure/*', azure_api_key=AZURE_API_KEY, azure_api_version=AZURE_API_VERSION, azure_api_base=AZURE_API_BASE) eval_llm = EvalLLM(settings) ``` + + The model name should start with `azure/` for UpTrain to recognize you are using models hosted on Azure. + + For example if you are using `gpt-35-turbo` via Azure, the model name should be `azure/gpt-35-turbo` + diff --git a/docs/llms/mistral.mdx b/docs/llms/mistral.mdx index 310fdd50..8c33517d 100644 --- a/docs/llms/mistral.mdx +++ b/docs/llms/mistral.mdx @@ -41,7 +41,11 @@ You can use your Mistral API key to run LLM evaluations using UpTrain. settings = Settings(model = 'mistral/mistral-tiny', mistral_api_key=MISTRAL_API_KEY) eval_llm = EvalLLM(settings) ``` - We use GPT 3.5 Turbo be default, you can use any other OpenAI models as well + + The model name should start with `mistral/` for UpTrain to recognize you are using Mistral. + + For example if you are using `mistral-tiny`, the model name should be `mistral/mistral-tiny` + diff --git a/docs/mint.json b/docs/mint.json index 3991b7bc..a23cd746 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -156,6 +156,16 @@ } ] }, + { + "group": "Dashboard", + "version": "v1", + "pages": [ + "dashboard/getting_started", + "dashboard/project", + "dashboard/evaluations", + "dashboard/prompts" + ] + }, { "group": "Supported LLMs", "version": "v1",