Skip to content

Commit d218954

Browse files
JoseCSantospdhotemsazure-sdkweshaggard
authored
Improve task adherence prompt and add sample notebook for bugbash (#40146)
* For ComputeInstance and AmlCompute update disableLocalAuth property based on ssh_public_access (#39934) * add disableLocalAuth for computeInstance * fix disableLocalAuthAuth issue for amlCompute * update compute instance * update recordings * temp changes * Revert "temp changes" This reverts commit 64e3c38. * update recordings * fix tests * Add resource prefix for safe secret standard alerts (#40028) Add the prefix to identify RGs that we are creating in our TME tenant to identify them as potentially using local auth and violating our safe secret standards. Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com> * Add examples to task_adherence prompt. Add Task Adherence sample notebook * Undo changes to New-TestResources.ps1 * Add sample .env file --------- Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>
1 parent 0a4d507 commit d218954

File tree

4 files changed

+381
-4
lines changed

4 files changed

+381
-4
lines changed

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_adherence/task_adherence.prompty

+45-1
Original file line numberDiff line numberDiff line change
@@ -38,26 +38,70 @@ system:
3838
**Definition:**
3939
Response completely ignores instructions or deviates significantly
4040

41+
**Example:**
42+
**Query:** What is a recommended weekend itinerary in Paris?
43+
**Response:** Paris is a lovely city with a rich history.
44+
45+
Explanation: This response completely misses the task by not providing any itinerary details. It offers a generic statement about Paris rather than a structured travel plan.
46+
47+
4148
**Level 2: Barely adherent**
4249

4350
**Definition:**
4451
Response partially aligns with instructions but has critical gaps.
4552

53+
**Example:**
54+
**Query:** What is a recommended weekend itinerary in Paris?
55+
**Response:** Spend your weekend visiting famous places in Paris.
56+
57+
Explanation: While the response hints at visiting well-known sites, it is extremely vague and lacks specific details, such as which sites to visit or any order of activities, leaving major gaps in the instructions.
58+
59+
4660
**Level 3: Moderately adherent**
4761

4862
**Definition:**
4963
Response meets the core requirements but lacks precision or clarity.
5064

65+
**Example:**
66+
**Query:** What is a recommended weekend itinerary in Paris?
67+
**Response:** Visit the Eiffel Tower and the Louvre on Saturday, and stroll through Montmartre on Sunday.
68+
69+
Explanation: This answer meets the basic requirement by naming a few key attractions and assigning them to specific days. However, it lacks additional context, such as timings, additional activities, or details to make the itinerary practical and clear.
70+
71+
5172
**Level 4: Mostly adherent**
5273

5374
**Definition:**
5475
Response is clear, accurate, and aligns with instructions with minor issues.
5576

56-
**Level 5: Fully adnerent**
77+
**Example:**
78+
**Query:** What is a recommended weekend itinerary in Paris?
79+
**Response:** For a weekend in Paris, start Saturday with a morning visit to the Eiffel Tower, then head to the Louvre in the early afternoon. In the evening, enjoy a leisurely walk along the Seine. On Sunday, begin with a visit to Notre-Dame Cathedral, followed by exploring the art and cafés in Montmartre. This plan offers a mix of cultural visits and relaxing experiences.
80+
81+
Explanation: This response is clear, structured, and provides a concrete itinerary with specific attractions and a suggested order of activities. It is accurate and useful, though it might benefit from a few more details like exact timings or restaurant suggestions to be perfect.
82+
83+
84+
**Level 5: Fully adherent**
5785

5886
**Definition:**
5987
Response is flawless, accurate, and follows instructions to the letter.
6088

89+
**Example:**
90+
**Query:** What is a recommended weekend itinerary in Paris?
91+
**Response:** Here is a detailed weekend itinerary in Paris:
92+
Saturday:
93+
Morning: Begin your day with a visit to the Eiffel Tower to admire the views from the top.
94+
Early Afternoon: Head to the Louvre for a guided tour of its most famous exhibits.
95+
Late Afternoon: Take a relaxing walk along the Seine, stopping at local boutiques.
96+
Evening: Enjoy dinner at a classic Parisian bistro near the river.
97+
Sunday:
98+
Morning: Visit the Notre-Dame Cathedral to explore its architecture and history.
99+
Midday: Wander the charming streets of Montmartre, stopping by art galleries and cafés.
100+
Afternoon: Finish your trip with a scenic boat tour on the Seine.
101+
This itinerary balances cultural immersion, leisure, and local dining experiences, ensuring a well-rounded visit.
102+
103+
Explanation: This response is comprehensive and meticulously follows the instructions. It provides detailed steps, timings, and a variety of activities that fully address the query, leaving no critical gaps.
104+
61105
# Data
62106
Query: {{query}}
63107
Response: {{response}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
MODEL_DEPLOYMENT_NAME=enter_model_deployment_name_eg_gpt-4o-mini
2+
MODEL_DEPLOYMENT_API_VERSION=enter_deployment_api_version_eg_2024-10-21
3+
PROJECT_CONNECTION_STRING=enter_project_connection_string

sdk/evaluation/azure-ai-evaluation/samples/bug_bash/response_completeness.ipynb

+49-3
Original file line numberDiff line numberDiff line change
@@ -123,17 +123,63 @@
123123
"result"
124124
]
125125
},
126+
{
127+
"cell_type": "markdown",
128+
"metadata": {},
129+
"source": [
130+
"# Batch run for response completeness"
131+
]
132+
},
126133
{
127134
"cell_type": "code",
128135
"execution_count": null,
129136
"metadata": {},
130137
"outputs": [],
131-
"source": []
138+
"source": [
139+
"import json\n",
140+
"\n",
141+
"data = [\n",
142+
" {\n",
143+
" \"response\": \"The temperature of Seattle now is 70 degrees. Based on the temperature, having an outdoor office party is recommended.\",\n",
144+
" \"ground_truth\": \"The temperature of Seattle now is 50 degrees. It will be recommended to bring a jacket in the evening.\",\n",
145+
" },\n",
146+
" {\n",
147+
" \"response\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback.\",\n",
148+
" \"ground_truth\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback by EOD.\",\n",
149+
" },\n",
150+
" {\n",
151+
" \"response\": \"Based on the retrieved documents, the shareholder meeting discussed the operational efficiency of the company and financing options.\",\n",
152+
" \"ground_truth\": \"The shareholder meeting discussed the compensation package of the company CEO.\",\n",
153+
" }\n",
154+
"]\n",
155+
"\n",
156+
"file_path = \"response_completeness_data.json\"\n",
157+
"\n",
158+
"with open(file_path, \"w\") as file:\n",
159+
" json.dump(data, file)\n",
160+
"\n",
161+
"from azure.ai.evaluation import evaluate\n",
162+
"\n",
163+
"response = evaluate(\n",
164+
" data=file_path,\n",
165+
" evaluators={\n",
166+
" \"completeness\": completeness_evaluator,\n",
167+
" },\n",
168+
"\n",
169+
" azure_ai_project={\n",
170+
" \"subscription_id\": \"<subscription_id>\",\n",
171+
" \"project_name\": \"<project_name>\",\n",
172+
" \"resource_group_name\": \"<resource_group_name>\",\n",
173+
" }\n",
174+
")\n",
175+
"\n",
176+
"pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')"
177+
]
132178
}
133179
],
134180
"metadata": {
135181
"kernelspec": {
136-
"display_name": "agent-evals-bug-bash",
182+
"display_name": "Python 3",
137183
"language": "python",
138184
"name": "python3"
139185
},
@@ -147,7 +193,7 @@
147193
"name": "python",
148194
"nbconvert_exporter": "python",
149195
"pygments_lexer": "ipython3",
150-
"version": "3.11.11"
196+
"version": "3.11.9"
151197
}
152198
},
153199
"nbformat": 4,

0 commit comments

Comments
 (0)