Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add Copilot Provider #518

Open
Grant-Archibald-MS opened this issue Jan 8, 2025 · 5 comments
Open

[Feature]: Add Copilot Provider #518

Grant-Archibald-MS opened this issue Jan 8, 2025 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@Grant-Archibald-MS
Copy link
Contributor

Grant-Archibald-MS commented Jan 8, 2025

Is your feature request related to a problem? Please describe.

Testing is often an after thought and not considered and integral part of the operations math to ensure successful operation of the deployed solution.

Given the existing functionality and foundations of Test Engine for Power Apps, provide integrated solution for generating tests for Copilot Studio Agents.

This feature should allow integration into a pipeline deployment model to validate the expected operation of the Copilot. The goal of the tests should enable different stakeholders to have an objective view of the health of the system and deploy new agents or update existing agents with confidence in the continued operation of the Copilot.

Describe the solution you'd like

Image

Extend the Provider model Test Engine to provide the ability to execute tests against the Microsoft Copilot Studio agent.

Learning Path

Provide structured learning path with interactive Learning Modules to help me understand Power Fx so that makers can build test cases.

This could build on early example browser-based execution of Power Fx test steps in an extended learning module format with interactive learning path.

Image

For example as part of the Power Apps Test Engine technical learning path there is a module on Asserting Results. This module demonstrates the ability to execute interactive test steps in Power Fx directly inside the learning module to help master concepts without the need to context switch to a different web page.

Record and Replay

Getting started can be hard to address this blocker this feature could look to extend existing Record and Replay allows the user to interact with Test CoPilot to automatically generate Power Fx specific to interacting with the agent in the Copilot studio.

This approach could look to extend examples like Recording Your First Test and enable easy Test Authoring

Test Authoring

To create a more scalable method of authoring test steps created via Record and Replay or test created from learning concepts adding a web based option provides a greater set of options to reduce components that need to be installed.

This could extend the current Visual Studio Code editor experience by offering an interactive web-based authoring experience as a deployed Power App.

The Application should allow editing and syntax validation of Test Cases and saving the test cases to a location that can be integrated with the Power Platform Deployment Pipeline

Deployment and Test Execution

Provide examples of how to extend Power Platform Pipeline to allow Test Cases to be executed and included in Deployment Results. The process should complete with a gated approval process to review changes and test results to allow deployment of the Copilot.

Test Case Features

The provider should be able to:

  1. Ability to login using user credentials
  2. Test published and unpublished Copilot
  3. Monitor API calls to create Power Fx variables that can be Asserted using simple low code. This could include:
    • Messages Sent / Received
    • Attachments
    • Knowledge sources used
    • Steps triggered
    • AI Response (Text Summary, Speech Summary, Text Citations (Collection - Title, Type, Position),
  4. New Power Fx functions to interact with Copilot using Test Engine Power Fx extensions specific to Copilot concepts. For example:
Send("Some user text");
AttachFile("file name");
WaitUntilResponseComplete();
Assert(CountRows(Filter(Steps, Name = "UniversalSearchTool"))>0,"Universal search tool triggered");
Assert(CountRows(Filter(KnowledgeSources, "docx" in Name)) > 0, "Microsoft Word Data Sources Found"); 
Set(Greeting, First(Response));
AIBuilderMatch(Greeting.Text,"The greeting should be fun and engaging and provide helpful tips to get started");
Select(Last(Response).Action1); // Select Action 1 button from Adaptive Card

Describe alternatives you've considered

The Power CAT Copilot Studio Kit includes functionality to create and run test cases. Including:

  • Define Test Sets and Test Cases
  • Support Different Test Types
    • Response Match: Test Utterance, Expected Response
    • Topic Match: Test Utterance, Expected Topic Match, Expected Response
    • Attachments (Adaptive Cards): Test Utterance, Expected Json, Expected Response
    • Generative Answer: Test Utterance, Expected Generative Answer Outcome
  • Enrich tests with Application Insights Queries
  • Enrich tests Conversation Transcripts (Activity, Channels, Ambiguous Utterances, Attachment, Intent (Candidates, Scores), Session Details, Citations, Results...)
  • Analyze Generated answers by running AI Builder Analysis
  • Expected Values (Attachments, Responses, Topics...)
  • Interact with Direct Line API

Additional context?

The simplicity of the of the Power CAT Copilot Tookit Test Sets and Cases provides a great method to handle Test Execution. Key discussion points:

  • Path to first party execution of Copilot tests as part of pipeline deployments?
  • Can the role of record and reply mitigate the getting started problem?
  • By using interactive learning modules directly in the browser and simulated Copilot conversations does it make it easier to demonstrate how testing can be done early in the process?
  • By leveraging the rich feature set of Power Fx that already exists today does it make the process of creating E2E tests much easier
  • What specific extensions would be needed to effectively support Copilot test scenarios?
  • Overlap with the features of the Copilot Toolkit testing and how handle complimentary nature of test cases
@Grant-Archibald-MS
Copy link
Contributor Author

Combining this with interactive simulation of the and test data setup could also help in the authoring process

Simulation Editor
Message from chat WaitForMessage();
>>>>> Response from user Send("Response from User");
From follow text AICheck(First(Message).Text,"The tone of the message should be happy and upbeat..."

1 similar comment
@Grant-Archibald-MS
Copy link
Contributor Author

Combining this with interactive simulation of the and test data setup could also help in the authoring process

Simulation Editor
Message from chat WaitForMessage();
>>>>> Response from user Send("Response from User");
From follow text AICheck(First(Message).Text,"The tone of the message should be happy and upbeat..."

@Grant-Archibald-MS
Copy link
Contributor Author

Grant-Archibald-MS commented Jan 21, 2025

To aid in the design of the Copilot test steps for the provider started on [Feature]: Getting Started Guide #513

Converted the content from PowerfulDev Testing site to docs as part code and published as Power Apps Test Engine GitHub Pages Hosted Documentation.

Aim to first look at different Power Fx language functions using the Power Fx Playground with sample Power Fx that could be implement to cover different testing scenarios including:

@Grant-Archibald-MS
Copy link
Contributor Author

Create first sample test case for Microsoft Copilot Studio using Experimental namespace provide for the Microsoft Copilot Studio test component for a created agent.

The sample will

  1. Login as a user account to the configure environment and agent
  2. Use the test panel of the Portal
  3. Allow a message to be sent
  4. Wait for a message to be received that matches an expected response text.

The initial sample Power Fx

Experimental.WaitUntilMessage("Safe Travels Agent");
  Assert(CountRows(Copilot.Messages) > 1);
  Experimental.SendText("Where can I fly today?");
  // Wait for response
  Experimental.WaitUntilMessage("Is there anything else I can help you with");

This example waits for the greeting
Send message
Waits for the final message

Next Steps to consider:

  1. Playground Examples. Create Learning Playground samples to discuss the approach and concepts

  2. Extensions the WaitForMessage. For example:

    • Literal for text match
    • Fuzzy Wait that evaluates responses and looks for high probability match
    • AI Assisted wait that evaluates prompt to determine is match found
    • Meaning Match evaluates responses and expected response and determines match of meaning
  3. Global Asserts. How test for common patterns across all responses:

    • Ability to handle questions it does not know about
    • Tone of the message and how well it aligns withe the "personality" of the agent
  4. How provide a range of different inputs / synonyms and ensure matching Intent

@Grant-Archibald-MS
Copy link
Contributor Author

Grant-Archibald-MS commented Jan 30, 2025

Extend the copilot provider to also support DirectLine as an alternative provider to the Copilot . Considering the following:

  1. Method of obtaining the direct line secret
  2. Ability to make use of Direct Line SDK
  3. The differences in enrichment of other context data as the DirectLine API just returns the conversation and not the Agent state. For example:
  • Knowlege Sources used
  • Actions used
  • Variables
  1. What the enrichment services would be required for data stored in Dataverse, Application Insights
  2. Validation of responses using AI Builder
  3. Testable interface using XUnit tests
  4. Testability of DirectLine components with interface similar to:
public interface IDirectLineClientWrapper
{
    Task<TokenResponse> GenerateTokenForNewConversationAsync();
    Task<Conversation> StartConversationAsync();
    Task<ResourceResponse> PostActivityAsync(string conversationId, Activity activity);
    Task<ActivitySet> GetActivitiesAsync(string conversationId);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant