From c0c1b4de9548bc694f76f35bebce24d14bee81f4 Mon Sep 17 00:00:00 2001 From: Saikat Mitra Date: Mon, 6 May 2024 23:43:28 +0530 Subject: [PATCH 1/2] docs: add support for JSON dataset --- docs/dataset/basics.mdx | 103 +++++++++++++++++++++------------------- 1 file changed, 53 insertions(+), 50 deletions(-) diff --git a/docs/dataset/basics.mdx b/docs/dataset/basics.mdx index 891689d6..00a6f985 100644 --- a/docs/dataset/basics.mdx +++ b/docs/dataset/basics.mdx @@ -42,35 +42,26 @@ The LLM prompt can use this value in the prompt through the `{{user_message}}` p } ``` -## Import from JSONL file - -Specify a path to the JSONL file. Each line of the file should be a valid JSON object. -On import, the keys of this JSON will be converted into inputs of the sample. - -If using relative paths, the path is treated relative to the configuration file. +## Import from Google Sheets +Specify a path to the Google sheet in the `empiricalrc.json` file. -```json +```json empiricalrc.json "dataset": { - "path": "HumanEval.jsonl" + "path": "https://docs.google.com/spreadsheets/d/1AsMekKCG74m1PbBZQN_sEJgaW0b9Xarg4ms4mhG3i5k" } ``` +Refer to our [chatbot example](https://github.com/empirical-run/empirical/tree/main/examples/chatbot) which uses this dataset. -## Import from CSV -Specify a path to the CSV file in the `empiricalrc.json`. If using relative paths, the path is treated relative to the configuration file. +The sheet should contain column headers. +The rows of the file are converted into dataset inputs with column header names as the name of the parameter. For example: -```json -"dataset": { - "path": "foo.csv" -} +```md +| name | age | +| ---- | --- | +| John | 25 | ``` -The CSV file should contain headers. -The lines of the file are converted into dataset inputs with column header names as the name of the parameter. For example: -```csv foo.csv -name,age -John,25 -``` -The above CSV gets converted into the following dataset object: +The above table in the sheet gets converted into the following dataset object: ```json "dataset": { "samples": [ @@ -84,33 +75,58 @@ The above CSV gets converted into the following dataset object: } ``` -The above conversion enables you to create a prompt with placeholders. For example: -```json +The above conversion enables you to create prompt with placeholders. For example: +```json empiricalrc.json { "prompt": "Your name is {{name}} and you are a helpful assistant..." } ``` -## Import from Google Sheets -Specify a path to the Google sheet in the `empiricalrc.json` file. +> If you wish to extract data from a specific sheet of Google Sheet, make sure to navigate to the desired sheet and copy the browser URL into `empiricalrc.json`. -```json empiricalrc.json +## Import from JSONL file + +Specify a path to the JSONL file. Each line of the file should be a valid JSON object. +On import, the keys of this JSON will be converted into inputs of the sample. + +If using relative paths, the path is treated relative to the configuration file. + +```json "dataset": { - "path": "https://docs.google.com/spreadsheets/d/1AsMekKCG74m1PbBZQN_sEJgaW0b9Xarg4ms4mhG3i5k" + "path": "HumanEval.jsonl" } ``` -Refer to our [chatbot example](https://github.com/empirical-run/empirical/tree/main/examples/chatbot) which uses this dataset. -The sheet should contain column headers. -The rows of the file are converted into dataset inputs with column header names as the name of the parameter. For example: +## Import from JSON -```md -| name | age | -| ---- | --- | -| John | 25 | +Specify a path to the JSON file. The file should contain array of objects. +On import, the object keys will be converted into inputs of the sample. + +If using relative paths, the path is treated relative to the configuration file. + +```json +"dataset": { + "path": "dataset.json" +} ``` +Refer to [tool call example](https://github.com/empirical-run/empirical/tree/main/examples/tool_calls) which uses this dataset. + +## Import from CSV +Specify a path to the CSV file in the `empiricalrc.json`. If using relative paths, the path is treated relative to the configuration file. -The above table in the sheet gets converted into the following dataset object: +```json +"dataset": { + "path": "foo.csv" +} +``` + +The CSV file should contain headers. +The lines of the file are converted into dataset inputs with column header names as the name of the parameter. For example: +```csv foo.csv +name,age +John,25 +``` +The above CSV gets converted into the following dataset object: ```json "dataset": { "samples": [ @@ -124,27 +140,14 @@ The above table in the sheet gets converted into the following dataset object: } ``` -The above conversion enables you to create prompt with placeholders. For example: -```json empiricalrc.json +The above conversion enables you to create a prompt with placeholders. For example: +```json { "prompt": "Your name is {{name}} and you are a helpful assistant..." } ``` -> If you wish to extract data from a specific sheet of Google Sheet, make sure to navigate to the desired sheet and copy the browser URL into `empiricalrc.json`. - -## Import Empirical JSON format - -If your dataset follows the Empirical JSON format, you can import that from -a file or HTTP endpoint. - -```json -"dataset": { - "path": "https://assets.empirical.run/datasets/json/spider-tiny.json" -} -``` - From 13a7f88d6aedad994b24b97ec19bd8f3a4c9faea Mon Sep 17 00:00:00 2001 From: Saikat Mitra Date: Mon, 6 May 2024 23:45:05 +0530 Subject: [PATCH 2/2] chore: add changeset --- .changeset/clever-guests-appear.md | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 .changeset/clever-guests-appear.md diff --git a/.changeset/clever-guests-appear.md b/.changeset/clever-guests-appear.md new file mode 100644 index 00000000..0a0e9705 --- /dev/null +++ b/.changeset/clever-guests-appear.md @@ -0,0 +1,5 @@ +--- + +--- + +docs: add support for JSON dataset