microsoft · vyokky · Apr 2, 2024 · Mar 13, 2024 · Mar 13, 2024 · Mar 25, 2024
diff --git a/.gitignore b/.gitignore
@@ -21,8 +21,10 @@ ufo/rag/app_docs/*
 learner/records.json
 vectordb/docs/*
 vectordb/experience/*
+vectordb/demonstration/*
 
 # Don't ignore the example files
 !vectordb/docs/example/
+!vectordb/demonstration/example.yaml
 
 .vscode
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ UFO sightings have garnered attention from various media outlets, including:
 
 These sources provide insights into the evolving landscape of technology and the implications of UFO phenomena on various platforms.
 
+
 ## 💥 Highlights
 
 - [x] **First Windows Agent** - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS.
@@ -156,6 +157,15 @@ RAG_EXPERIENCE: True  # Whether to use the RAG from its self-experience.
 RAG_EXPERIENCE_RETRIEVED_TOPK: 5  # The topk for the offline retrieved documents
 ```
 
+#### RAG from User-Demonstration
+Boost UFO's capabilities through user demonstration! Utilize Microsoft Steps Recorder to record step-by-step processes for achieving specific tasks. With a simple command processed by the record_processor (refer to the [README](./record_processor/README.md)), UFO can store these trajectories in its memory for future reference, enhancing its learning from user interactions.
+
+You can enable this function by setting the following configuration:
+```bash
+## RAG Configuration for demonstration
+RAG_DEMONSTRATION: True  # Whether to use the RAG from its user demonstration.
+RAG_DEMONSTRATION_RETRIEVED_TOPK: 5  # The topk for the offline retrieved documents
+```
 
 
 ### 🎉 Step 4: Start UFO

diff --git a/assets/record_processor/add_comment.png b/assets/record_processor/add_comment.png
diff --git a/record_processor/README.md b/record_processor/README.md
@@ -0,0 +1,69 @@
+
+# Enhancing UFO with RAG using User Demonstration
+
+UFO can learn from user-provided demonstrations for specific requests and use them as references in the future when encountering similar tasks. Providing clear demonstrations along with precise requests can significantly enhance UFO's performance.
+
+## How to Enable and Config this Function ❓
+You can enable this function by setting the following configuration:
+```bash
+## RAG Configuration for demonstration
+RAG_DEMONSTRATION: True  # Whether to use the RAG from its user demonstration.
+RAG_DEMONSTRATION_RETRIEVED_TOPK: 5  # The topk for the offline retrieved documents
+RAG_DEMONSTRATION_COMPLETION_N: 3  # The number of completion choices for the demonstration result
+```
+
+## How to Prepare Your Demostration  ❓
+
+### Record your steps by Microsoft Steps Recorder
+
+UFO currently support study user trajectories recorded by Steps Recorder app integrated within the Windows. More tools will be supported in the future. 
+
+**Step 1: Record your steps**
+
+You can follow this [official guidance](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47) to record your steps for a specific request.
+
+
+**Step 2: Add comments in each step if needed**
+
+Feel free to add any specific details or instructions for UFO to notice by including them in comments. Additionally, since Steps Recorder doesn't capture typed text, if you need to convey any typed content to UFO, please ensure to include it in the comment as well.
+<h1 align="center">
+    <img src="../assets/record_processor/add_comment.png"/> 
+</h1>
+
+
+**Step 3: Review and save**
+
+Examine the steps and save them to a ZIP file. You can refer to the [sample_record.zip](./example/sample_record.zip) as an illustration of the recorded steps for a specific request: "sending an email to example@gmail.com to say hi."
+
+
+## How to Let UFO Study the User Demonstration ❓
+
+
+Once you have your demonstration record ZIP file ready, you can easily parse it as an example to support RAG for UFO. Follow these steps:
+
+```console
+# assume you are in the cloned UFO folder
+ python -m record_processor -r <your request for the demonstration> -p <record ZIP file path>
+```
+Replace `your request for the demonstration` with the specific request, such as "sending an email to example@gmail.com to say hi."
+Replace `record ZIP file path` with the full path to the ZIP file you just created.
+
+This command will parse the record and summarize to an execution plan. You'll see the confirmation message as follow:
+```
+Here are the plans summarized from your demonstration:
+Plan [1]
+(1) Input the email address 'example@gmail.com' in the 'To' field.
+(2) Input the subject of the email. I need to input 'Greetings'.
+(3) Input the content of the email. I need to input 'Hello,\nI hope this message finds you well. I am writing to send you a warm greeting and to wish you a great day.\nBest regards.'
+(4) Click the Send button to send the email.
+Plan [2]
+(1) ***
+(2) ***
+(3) ***
+Plan [3]
+(1) ***
+(2) ***
+(3) ***
+Would you like to save any one of them as future reference by the agent? press [1] [2] [3] to save the corresponding plan, or press any other key to skip.
+```
+Press `1` to save it into its memory for furture reference. A sample could be find [here](../vectordb/demonstration/example.yaml).
diff --git a/record_processor/__init__.py b/record_processor/__init__.py
@@ -0,0 +1,2 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
diff --git a/record_processor/__main__.py b/record_processor/__main__.py
@@ -0,0 +1,7 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+from . import record_processor
+
+if __name__ == "__main__":
+    # Execute the main script
+    record_processor.main()
diff --git a/record_processor/example/sample_record.zip b/record_processor/example/sample_record.zip
diff --git a/record_processor/parser/demonstration_record.py b/record_processor/parser/demonstration_record.py
@@ -0,0 +1,60 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+class DemonstrationStep:
+    """
+    Class for the single step information in the user demonstration record.
+    Multiple steps will be recorded to achieve a specific request.
+    """
+
+    def __init__(self, application: str, description: str, action: str, screenshot: str, comment: str):
+        """
+        Create a new step.
+        """
+        self.application = application
+        self.description = description
+        self.action = action
+        self.comment = comment
+        self.screenshot = screenshot
+
+class DemonstrationRecord:
+    """
+    Class for the user demonstration record.
+    A serise of steps user performed to achieve a specific request will be recorded in this class.
+    """
+
+    def __init__(self, applications: list, step_num: int, **steps: DemonstrationStep):
+        """
+        Create a new Record.
+        """
+        self.__request = ""
+        self.__round = 0
+        self.__applications = applications
+        self.__step_num = step_num
+        # adding each key-value pair in steps to the record
+        for index, step in steps.items():
+            setattr(self, index, step.__dict__)
+
+    def set_request(self, request: str):
+        """
+        Set the request.
+        """
+        self.__request = request
+
+    def get_request(self) -> str:
+        """
+        Get the request.
+        """
+        return self.__request
+
+    def get_applications(self) -> list:
+        """
+        Get the application.
+        """
+        return self.__applications
+
+    def get_step_num(self) -> int:
+        """
+        Get the step number.
+        """
+        return self.__step_num
diff --git a/record_processor/parser/psr_record_parser.py b/record_processor/parser/psr_record_parser.py
@@ -0,0 +1,171 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+import re
+import xml.etree.ElementTree as ET
+from bs4 import BeautifulSoup
+from .demonstration_record import DemonstrationStep, DemonstrationRecord
+
+
+class PSRRecordParser:
+    """
+    Class for parsing the steps recorder .mht file content to user demonstration record.
+    """
+
+    def __init__(self, content: str):
+        """
+        Constructor for the RecordParser class.
+        """
+        self.content = content
+        self.parts_dict = {}
+        self.applications = []
+        self.comments = []
+        self.steps = []
+
+    def parse_to_record(self) -> DemonstrationRecord:
+        """
+        Parse the steps recorder .mht file content to record in following steps:
+        1. Find the boundary in the .mht file.
+        2. Split the file by the boundary into parts.
+        3. Get the comments for each step.
+        4. Get the steps from the content.
+        5. Construct the record object and return it.
+        return: A record object.
+        """
+        boundary = self.find_boundary()
+        self.parts_dict = self.split_file_by_boundary(boundary)
+        self.comments = self.get_comments(
+            self.parts_dict['main.htm']['Content'])
+        self.steps = self.get_steps(self.parts_dict['main.htm']['Content'])
+        record = DemonstrationRecord(
+            list(set(self.applications)), len(self.steps), **self.steps)
+
+        return record
+
+    def find_boundary(self) -> str:
+        """
+        Find the boundary in the .mht file.
+        """
+
+        boundary_start = self.content.find("boundary=")
+
+        if boundary_start != -1:
+            boundary_start += len("boundary=")
+            boundary_end = self.content.find("\n", boundary_start)
+            boundary = self.content[boundary_start:boundary_end].strip('\"')
+            return boundary
+        else:
+            raise ValueError("Boundary not found in the .mht file.")
+
+    def split_file_by_boundary(self, boundary: str) -> dict:
+        """
+        Split the file by the boundary into parts, 
+        Store the parts in a dictionary, including the content type,
+        content location and content transfer encoding.
+        boundary: The boundary of the file.
+        return: A dictionary of parts in the file.
+        """
+        parts = self.content.split("--" + boundary)
+        part_dict = {}
+        for part in parts:
+            content_type_start = part.find("Content-Type:")
+            content_location_start = part.find("Content-Location:")
+            content_transfer_encoding_start = part.find(
+                "Content-Transfer-Encoding:")
+            part_info = {}
+            if content_location_start != -1:
+                content_location_end = part.find("\n", content_location_start)
+                content_location = part[content_location_start:content_location_end].split(":")[
+                    1].strip()
+
+                # add the content location
+                if content_type_start != -1:
+                    content_type_end = part.find("\n", content_type_start)
+                    content_type = part[content_type_start:content_type_end].split(":")[
+                        1].strip()
+                    part_info["Content-Type"] = content_type
+
+                # add the content transfer encoding
+                if content_transfer_encoding_start != -1:
+                    content_transfer_encoding_end = part.find(
+                        "\n", content_transfer_encoding_start)
+                    content_transfer_encoding = part[content_transfer_encoding_start:content_transfer_encoding_end].split(":")[
+                        1].strip()
+                    part_info["Content-Transfer-Encoding"] = content_transfer_encoding
+
+                content = part[content_location_end:].strip()
+                part_info["Content"] = content
+                part_dict[content_location] = part_info
+        return part_dict
+
+    def get_steps(self, content: str) -> dict:
+        """
+        Get the steps from the content in fllowing steps:
+        1. Find the UserActionData tag in the content.
+        2. Parse the UserActionData tag to get the steps.
+        3. Get the screenshot for each step.
+        4. Get the comments for each step.
+        content: The content of the main.htm file.
+        return: A dictionary of steps.
+        """
+
+        user_action_data = re.search(
+            r'<UserActionData>(.*?)</UserActionData>', content, re.DOTALL)
+        if user_action_data:
+
+            root = ET.fromstring(user_action_data.group(1))
+            steps = {}
+
+            for each_action in root.findall('EachAction'):
+
+                action_number = each_action.get('ActionNumber')
+                application = each_action.get('FileName')
+                description = each_action.find('Description').text
+                action = each_action.find('Action').text
+                screenshot_file_name = each_action.find(
+                    'ScreenshotFileName').text
+                screenshot = self.get_screenshot(screenshot_file_name)
+                step_key = f"step_{int(action_number) - 1}"
+
+                step = DemonstrationStep(
+                    application, description, action, screenshot, self.comments.get(step_key))
+                steps[step_key] = step
+                self.applications.append(application)
+            return steps
+        else:
+            raise ValueError("UserActionData not found in the file.")
+
+    def get_comments(self, content: str) -> dict:
+        """
+        Get the user input comments for each step
+        content: The content of the main.htm file.
+        return: A dictionary of comments for each step.
+        """
+        soup = BeautifulSoup(content, 'html.parser')
+        body = soup.body
+        steps_html = body.find('div', id='Steps')
+        steps = steps_html.find_all(lambda tag: tag.name == 'div' and tag.has_attr(
+            'id') and re.match(r'^Step\d+$', tag['id']))
+
+        comments = {}
+        for index, step in enumerate(steps):
+            comment_tag = step.find('b', text='Comment: ')
+            comments[f'step_{index}'] = comment_tag.next_sibling if comment_tag else None
+        return comments
+
+    def get_screenshot(self, screenshot_file_name: str) -> str:
+        """
+        Get the screenshot by screenshot file name.
+        The screenshot related information is stored in the parts_dict.
+        screenshot_file_name: The file name of the screenshot.
+        return: The screenshot in base64 string.
+        """
+        screenshot_part = self.parts_dict[screenshot_file_name]
+        content = screenshot_part['Content']
+        content_type = screenshot_part['Content-Type']
+        content_transfer_encoding = screenshot_part['Content-Transfer-Encoding']
+
+        screenshot = 'data:{type};{encoding}, {content}'.format(
+            type=content_type, encoding=content_transfer_encoding, content=content)
+
+        return screenshot
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# Copyright (c) Microsoft Corporation.
		# Licensed under the MIT License.