-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize Output Dictionaries, Mock API Calls, Math (GSM8K, SVAMP, TabMWP) for Reflexion #186
Conversation
WalkthroughThis update introduces extensive enhancements to the reflexion and critic strategies across various files, primarily adding support for external tool outputs and more complex output dictionaries. Additionally, new reflexion strategies for math-specific tasks are integrated. Several changes improve how observations and critiques are generated and structured. Finally, a few-shot example type "COT" is added for math benchmarks, and relevant tests and documentation are updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant Strategy
participant ExternalTool
Agent->>+Strategy: generate_observation(step_idx, action_type, query, key)
Strategy->>+ExternalTool: fetch_data(query)
ExternalTool-->>-Strategy: data(dict)
Strategy-->>-Agent: is_correct, obs, external_tool_info
Agent->>+Strategy: react_create_output_dict(thought, action_type, query, obs, external_tool_info, is_correct)
Strategy-->>-Agent: output_dict(dict)
note right of Agent: Handles observations and reactions with external tool info.
Poem
Tip Early access features
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- agential/cog/strategies/reflexion/math.py (1 hunks)
Additional context used
GitHub Check: codecov/patch
agential/cog/strategies/reflexion/math.py
[warning] 3-5: agential/cog/strategies/reflexion/math.py#L3-L5
Added lines #L3 - L5 were not covered by tests
[warning] 11-13: agential/cog/strategies/reflexion/math.py#L11-L13
Added lines #L11 - L13 were not covered by tests
[warning] 15-16: agential/cog/strategies/reflexion/math.py#L15-L16
Added lines #L15 - L16 were not covered by tests
[warning] 18-19: agential/cog/strategies/reflexion/math.py#L18-L19
Added lines #L18 - L19 were not covered by tests
[warning] 21-22: agential/cog/strategies/reflexion/math.py#L21-L22
Added lines #L21 - L22 were not covered by tests
[warning] 24-25: agential/cog/strategies/reflexion/math.py#L24-L25
Added lines #L24 - L25 were not covered by tests
[warning] 27-28: agential/cog/strategies/reflexion/math.py#L27-L28
Added lines #L27 - L28 were not covered by tests
[warning] 30-31: agential/cog/strategies/reflexion/math.py#L30-L31
Added lines #L30 - L31 were not covered by tests
[warning] 33-34: agential/cog/strategies/reflexion/math.py#L33-L34
Added lines #L33 - L34 were not covered by tests
[warning] 36-37: agential/cog/strategies/reflexion/math.py#L36-L37
Added lines #L36 - L37 were not covered by tests
[warning] 40-42: agential/cog/strategies/reflexion/math.py#L40-L42
Added lines #L40 - L42 were not covered by tests
[warning] 44-45: agential/cog/strategies/reflexion/math.py#L44-L45
Added lines #L44 - L45 were not covered by tests
[warning] 47-48: agential/cog/strategies/reflexion/math.py#L47-L48
Added lines #L47 - L48 were not covered by tests
[warning] 50-51: agential/cog/strategies/reflexion/math.py#L50-L51
Added lines #L50 - L51 were not covered by tests
[warning] 53-54: agential/cog/strategies/reflexion/math.py#L53-L54
Added lines #L53 - L54 were not covered by tests
[warning] 56-57: agential/cog/strategies/reflexion/math.py#L56-L57
Added lines #L56 - L57 were not covered by tests
[warning] 59-60: agential/cog/strategies/reflexion/math.py#L59-L60
Added lines #L59 - L60 were not covered by tests
[warning] 62-63: agential/cog/strategies/reflexion/math.py#L62-L63
Added lines #L62 - L63 were not covered by tests
[warning] 65-66: agential/cog/strategies/reflexion/math.py#L65-L66
Added lines #L65 - L66 were not covered by tests
[warning] 68-69: agential/cog/strategies/reflexion/math.py#L68-L69
Added lines #L68 - L69 were not covered by tests
[warning] 71-72: agential/cog/strategies/reflexion/math.py#L71-L72
Added lines #L71 - L72 were not covered by tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (12)
- agential/cog/agent/react.py (1 hunks)
- agential/cog/functional/reflexion.py (2 hunks)
- agential/cog/prompts/agent/reflexion.py (1 hunks)
- agential/cog/prompts/benchmark/gsm8k.py (1 hunks)
- agential/cog/strategies/react/base.py (4 hunks)
- agential/cog/strategies/react/code.py (4 hunks)
- agential/cog/strategies/react/math.py (4 hunks)
- agential/cog/strategies/react/qa.py (3 hunks)
- agential/cog/strategies/reflexion/math.py (1 hunks)
- tests/cog/strategies/react/test_code.py (6 hunks)
- tests/cog/strategies/react/test_math.py (7 hunks)
- tests/cog/strategies/react/test_qa.py (8 hunks)
Files skipped from review due to trivial changes (1)
- agential/cog/prompts/agent/reflexion.py
Additional context used
GitHub Check: codecov/patch
agential/cog/strategies/reflexion/math.py
[warning] 3-3: agential/cog/strategies/reflexion/math.py#L3
Added line #L3 was not covered by tests
[warning] 5-5: agential/cog/strategies/reflexion/math.py#L5
Added line #L5 was not covered by tests
[warning] 7-7: agential/cog/strategies/reflexion/math.py#L7
Added line #L7 was not covered by tests
[warning] 9-10: agential/cog/strategies/reflexion/math.py#L9-L10
Added lines #L9 - L10 were not covered by tests
[warning] 16-16: agential/cog/strategies/reflexion/math.py#L16
Added line #L16 was not covered by tests
[warning] 20-20: agential/cog/strategies/reflexion/math.py#L20
Added line #L20 was not covered by tests
[warning] 24-25: agential/cog/strategies/reflexion/math.py#L24-L25
Added lines #L24 - L25 were not covered by tests
[warning] 28-28: agential/cog/strategies/reflexion/math.py#L28
Added line #L28 was not covered by tests
[warning] 43-44: agential/cog/strategies/reflexion/math.py#L43-L44
Added lines #L43 - L44 were not covered by tests
[warning] 46-51: agential/cog/strategies/reflexion/math.py#L46-L51
Added lines #L46 - L51 were not covered by tests
[warning] 53-53: agential/cog/strategies/reflexion/math.py#L53
Added line #L53 was not covered by tests
[warning] 56-56: agential/cog/strategies/reflexion/math.py#L56
Added line #L56 was not covered by tests
[warning] 66-66: agential/cog/strategies/reflexion/math.py#L66
Added line #L66 was not covered by tests
[warning] 74-77: agential/cog/strategies/reflexion/math.py#L74-L77
Added lines #L74 - L77 were not covered by tests
[warning] 79-81: agential/cog/strategies/reflexion/math.py#L79-L81
Added lines #L79 - L81 were not covered by tests
[warning] 83-85: agential/cog/strategies/reflexion/math.py#L83-L85
Added lines #L83 - L85 were not covered by tests
[warning] 87-87: agential/cog/strategies/reflexion/math.py#L87
Added line #L87 was not covered by tests
[warning] 109-110: agential/cog/strategies/reflexion/math.py#L109-L110
Added lines #L109 - L110 were not covered by tests
[warning] 119-120: agential/cog/strategies/reflexion/math.py#L119-L120
Added lines #L119 - L120 were not covered by tests
[warning] 122-122: agential/cog/strategies/reflexion/math.py#L122
Added line #L122 was not covered by tests
[warning] 124-124: agential/cog/strategies/reflexion/math.py#L124
Added line #L124 was not covered by tests
[warning] 132-133: agential/cog/strategies/reflexion/math.py#L132-L133
Added lines #L132 - L133 were not covered by tests
[warning] 142-142: agential/cog/strategies/reflexion/math.py#L142
Added line #L142 was not covered by tests
[warning] 144-145: agential/cog/strategies/reflexion/math.py#L144-L145
Added lines #L144 - L145 were not covered by tests
[warning] 147-147: agential/cog/strategies/reflexion/math.py#L147
Added line #L147 was not covered by tests
[warning] 149-149: agential/cog/strategies/reflexion/math.py#L149
Added line #L149 was not covered by tests
[warning] 152-158: agential/cog/strategies/reflexion/math.py#L152-L158
Added lines #L152 - L158 were not covered by tests
[warning] 160-160: agential/cog/strategies/reflexion/math.py#L160
Added line #L160 was not covered by tests
[warning] 162-163: agential/cog/strategies/reflexion/math.py#L162-L163
Added lines #L162 - L163 were not covered by tests
[warning] 165-165: agential/cog/strategies/reflexion/math.py#L165
Added line #L165 was not covered by tests
Additional comments not posted (24)
agential/cog/strategies/react/base.py (3)
4-4
: Addition ofAny
import fromtyping
module.The addition of
Any
to the import statement is necessary for the new type hints introduced in the methodsgenerate_observation
andcreate_output_dict
, which now includeDict[str, Any]
as part of their return types.
Line range hint
44-55
: Updated return type ingenerate_observation
method.The method now returns a tuple containing a string and a dictionary (
Dict[str, Any]
), which aligns with the requirement to include external tool outputs in the observation. This is a significant change as it affects how the method's outputs are handled in the calling code.
Line range hint
61-78
: Updatedcreate_output_dict
method to includeexternal_tool_info
.The method now accepts an additional parameter
external_tool_info
and returns a dictionary with more detailed information. This change is crucial for integrating external tool outputs into the ReAct strategy's output, enhancing the richness and utility of the data provided to downstream processes.agential/cog/agent/react.py (1)
99-109
: Enhancedgenerate
method to include external tool information.The
generate
method now captures external tool information during the observation phase and includes it in the output dictionary. This is a key enhancement that allows the agent to provide more context-rich information in its outputs, aligning with the overall goal of enhancing functionality.agential/cog/strategies/react/qa.py (3)
69-69
: Renamed attribute_current_answer
to_answer
.This change simplifies the attribute name while maintaining clarity. It's a straightforward renaming that should be reflected in all method implementations and usage within this class.
Line range hint
146-185
: Enhancedgenerate_observation
method to include external tool outputs.This method now enriches the observation with results from external tools such as search or lookup operations, which are captured in the
external_tool_info
dictionary. This enhancement is crucial for providing more detailed and useful observations in QA scenarios.
188-213
: Updatedcreate_output_dict
method to handle enriched outputs.The method now constructs an output dictionary that includes the thought, action type, query, observation, answer, and external tool outputs. This comprehensive approach ensures that all relevant information is encapsulated in the output, enhancing the utility of the ReAct agent's responses.
agential/cog/strategies/react/code.py (3)
69-69
: Renamed attribute_current_answer
to_answer
.This renaming simplifies the attribute name and enhances clarity, aligning with changes made across other strategy classes.
Line range hint
151-190
: Enhancedgenerate_observation
method to include external tool outputs.The method now correctly handles the execution status of code actions, capturing this information in
external_tool_info
. This enhancement is crucial for providing feedback on code execution, which is essential in coding scenarios.
193-218
: Updatedcreate_output_dict
method to handle enriched outputs.This method effectively encapsulates all relevant information, including the thought, action type, query, observation, answer, and external tool outputs, into a comprehensive output dictionary. This is essential for providing detailed feedback in code-related tasks.
agential/cog/strategies/react/math.py (3)
69-69
: Renamed attribute_current_answer
to_answer
.Consistent with changes in other strategy classes, this renaming simplifies the attribute name and enhances clarity.
Line range hint
151-188
: Enhancedgenerate_observation
method to include external tool outputs.The method now correctly handles the execution status and code answers for math actions, capturing this information in
external_tool_info
. This is crucial for providing detailed feedback on math computations.
191-216
: Updatedcreate_output_dict
method to handle enriched outputs.This method effectively encapsulates all relevant information into a comprehensive output dictionary, which is essential for providing detailed feedback in math-related tasks.
tests/cog/strategies/react/test_code.py (3)
118-120
: Updated method signature handled correctly.The changes correctly handle the updated return type of the
generate_observation
method, which now includesexternal_tool_info
. The test assertions are appropriately updated to handle the tuple response.Also applies to: 125-125
226-232
: Proper reset logic.The
reset
method correctly sets internal state variables back to their initial conditions, ensuring the strategy is properly reinitialized.
199-201
: Updated output dictionary format handled correctly.The changes correctly include
external_tool_info
in the output dictionary, matching the updated method signature.tests/cog/strategies/react/test_math.py (3)
130-132
: Updated method signature handled correctly.The changes correctly handle the updated return type of the
generate_observation
method, which now includesexternal_tool_info
. The test assertions are appropriately updated to handle the tuple response.Also applies to: 137-137
230-236
: Proper reset logic.The
reset
method correctly sets internal state variables back to their initial conditions, ensuring the strategy is properly reinitialized.
195-197
: Updated output dictionary format handled correctly.The changes correctly include
external_tool_info
in the output dictionary, matching the updated method signature.agential/cog/prompts/benchmark/gsm8k.py (1)
83-179
: New constants for few-shot examples added.The addition of
GSM8K_FEWSHOT_EXAMPLES_COT
andGSM8K_FEWSHOT_EXAMPLES_REACT
provides structured examples for the GSM8K dataset, enhancing the usability of these prompts in testing and development.tests/cog/strategies/react/test_qa.py (4)
219-231
: Validate error handling and message accuracy ingenerate_observation
.In this test case, the lookup operation fails, and a specific error message is returned. It's important to validate that the error handling is robust and that the error messages are accurate and informative, providing clear guidance to the user on what went wrong and potential next steps.
Verification successful
Validate error handling and message accuracy in
generate_observation
.The error message "Lookup failed" is correctly handled and present in the test case
tests/cog/strategies/react/test_qa.py
. This confirms that the error handling in thegenerate_observation
method is robust and the error messages are accurate and informative.
tests/cog/strategies/react/test_qa.py
: Line containingValueError("Lookup failed")
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Validate error handling and message accuracy. # Test: Search for error handling and messages in test cases. Expect: Robust handling and accurate messages. rg --type python $'Lookup failed'Length of output: 65
Script:
#!/bin/bash # Description: Validate error handling and message accuracy. # Corrected Test: Search for error handling and messages in test cases. Expect: Robust handling and accurate messages. rg 'Lookup failed' --glob '*.py'Length of output: 108
108-116
: Verify the structure and content ofexternal_tool_info
.The addition of
external_tool_info
to the output ofgenerate_observation
is a significant change that enhances the method's output by including additional contextual information. It's crucial to ensure that the structure and content of this dictionary meet the expected standards and are consistent across different scenarios.Verification successful
The structure and content of
external_tool_info
are consistent across different test cases.
tests/cog/strategies/react/test_math.py
:external_tool_info
contains keys likeexecution_status
andcode_answer
.tests/cog/strategies/react/test_code.py
:external_tool_info
contains the keyexecution_status
.tests/cog/strategies/critic/test_qa.py
:external_tool_info
contains keys likesearch_query
andsearch_result
.tests/cog/strategies/react/test_qa.py
:external_tool_info
contains keys likesearch_result
andlookup_result
.These variations indicate that
external_tool_info
is structured according to the specific needs of different strategies and tests.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the structure and content of external_tool_info in all test cases. # Test: Search for the usage of external_tool_info in test cases. Expect: Consistent structure. rg --type python $'external_tool_info'Length of output: 68
Script:
#!/bin/bash # Description: Verify the structure and content of external_tool_info in all test cases. # Test: Search for the usage of external_tool_info in test cases. Expect: Consistent structure. rg --glob '*.py' 'external_tool_info'Length of output: 18506
150-159
: Check for consistent updates toexternal_tool_info
during search operations.The test case for a successful search updates
external_tool_info
with relevant data. It's essential to ensure that this information is updated consistently and accurately during all search operations to maintain the reliability of the system.Verification successful
Check for consistent updates to
external_tool_info
during search operations.The test cases and implementation in the provided context show that
external_tool_info
is consistently updated during search operations across different files and scenarios. The updates are handled correctly in various action types, includingSearch
,Lookup
, andFinish
.
- Files and lines where
external_tool_info
is updated:
tests/cog/strategies/react/test_qa.py
agential/cog/strategies/react/qa.py
agential/cog/strategies/react/math.py
agential/cog/strategies/react/code.py
agential/cog/strategies/critic/qa.py
agential/cog/strategies/critic/math.py
agential/cog/strategies/critic/code.py
The updates are consistent and accurate, ensuring the reliability of the system during search operations.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistent updates to external_tool_info during search operations. # Test: Search for updates to external_tool_info during search operations. Expect: Consistent and accurate updates. rg --type python $'external_tool_info'Length of output: 68
Script:
#!/bin/bash # Description: Verify consistent updates to external_tool_info during search operations. # Test: Search for updates to external_tool_info during search operations. Expect: Consistent and accurate updates. rg --type py 'external_tool_info'Length of output: 18502
Script:
#!/bin/bash # Description: Extract context of updates to external_tool_info in relevant files. # Extract lines around occurrences of external_tool_info in search-related operations. rg --type py 'external_tool_info' -A 5 -B 5 tests/cog/strategies/react/test_qa.py tests/cog/strategies/react/test_math.py tests/cog/strategies/react/test_code.py agential/cog/strategies/react/qa.py agential/cog/strategies/react/math.py agential/cog/strategies/react/code.py agential/cog/strategies/critic/qa.py agential/cog/strategies/critic/math.py agential/cog/strategies/critic/code.pyLength of output: 69722
127-136
: Ensure consistent handling ofexternal_tool_info
across different scenarios.This test case demonstrates handling of
external_tool_info
when the action_type is "Finish". It's important to verify that all fields inexternal_tool,info
are consistently handled (like clearing or setting appropriate values) across different scenarios to ensure data integrity and avoid potential bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- agential/cog/functional/reflexion.py (5 hunks)
- tests/cog/strategies/reflexion/test_math.py (1 hunks)
- tests/cog/strategies/reflexion/test_qa.py (11 hunks)
- tests/cog/strategies/reflexion/tmp.ipynb (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- agential/cog/functional/reflexion.py
- tests/cog/strategies/reflexion/test_qa.py
Additional comments not posted (17)
tests/cog/strategies/reflexion/tmp.ipynb (3)
382-395
: LGTM!
402-416
: LGTM!
419-423
: Empty cell.No action needed.
tests/cog/strategies/reflexion/test_math.py (14)
36-46
: LGTM!
49-64
: LGTM!
67-77
: LGTM!
80-101
: LGTM!
104-127
: LGTM!
130-158
: LGTM!
161-203
: LGTM!
206-219
: LGTM!
221-244
: LGTM!
247-262
: LGTM!
265-273
: LGTM!
276-285
: LGTM!
288-298
: LGTM!
301-322
: LGTM!
"question = \"Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with 4933828. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\"\n", | ||
"key = -9867630\n", | ||
"\n", | ||
"strat = ReflexionReActMathStrategy(llm=llm)\n", | ||
"\n", | ||
"out = strat.generate(\n", | ||
" question=question,\n", | ||
" examples=GSM8K_FEWSHOT_EXAMPLES_REACT,\n", | ||
" reflections=\"\",\n", | ||
" prompt=REFLEXION_REACT_INSTRUCTION_GSM8K,\n", | ||
" additional_keys={},\n", | ||
" max_steps=5,\n", | ||
")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove or update hardcoded key value.
The hardcoded key value key = -9867630
may not be relevant and could lead to confusion. Consider removing or updating it.
- key = -9867630
+ key = some_relevant_key_value
Committable suggestion was skipped due to low confidence.
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"<PROMPT AGENT========================================================================>\n", | ||
"Answer a math question with interleaving Thought, Action, Observation steps. Thought can reason about the current question and plan the retrieval steps, and Action can be two types:\n", | ||
"(1) Calculate[code], which implements code to answer the math question, saving the answer as the `answer` variable.\n", | ||
"(2) Finish[code], which returns the code to answer the math question and finishes the task, saving the answer as the `answer` variable.\n", | ||
"You have a maximum of 5 steps.\n", | ||
"\n", | ||
"Here are some examples:\n", | ||
"Question: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\n", | ||
"Thought 1: First, I need to find out how many lollipops Jason gave to Denny.\n", | ||
"Action 1: Calculate[\n", | ||
"```python\n", | ||
"jason_lollipops_initial = 20\n", | ||
"jason_lollipops_after = 12\n", | ||
"answer = jason_lollipops_after - jason_lollipops_initial\n", | ||
"```\n", | ||
"]\n", | ||
"Observation 1: \n", | ||
"```python\n", | ||
"jason_lollipops_initial = 20\n", | ||
"jason_lollipops_after = 12\n", | ||
"answer = jason_lollipops_after - jason_lollipops_initial\n", | ||
"```\n", | ||
"Execution Status: Done\n", | ||
"Output: answer = -8\n", | ||
"Thought 2: The answer should be positive because Jason gave lollipops to Denny. I need to reverse the subtraction.\n", | ||
"Action 2: Calculate[\n", | ||
"```python\n", | ||
"jason_lollipops_initial = 20\n", | ||
"jason_lollipops_after = 12\n", | ||
"answer = jason_lollipops_initial - jason_lollipops_after\n", | ||
"```\n", | ||
"]\n", | ||
"Observation 2: \n", | ||
"```python\n", | ||
"jason_lollipops_initial = 20\n", | ||
"jason_lollipops_after = 12\n", | ||
"answer = jason_lollipops_initial - jason_lollipops_after\n", | ||
"```\n", | ||
"Execution Status: Done\n", | ||
"Output: answer = 8\n", | ||
"Thought 3: Jason gave 8 lollipops to Denny.\n", | ||
"Action 3: Finish[\n", | ||
"```python\n", | ||
"answer = 8\n", | ||
"```\n", | ||
"]\n", | ||
"Observation 3: \n", | ||
"```python\n", | ||
"answer = 8\n", | ||
"```\n", | ||
"\n", | ||
"---\n", | ||
"\n", | ||
"Question: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?\n", | ||
"Thought 1: I need to determine how many trees were planted.\n", | ||
"Action 1: Calculate[\n", | ||
"```python\n", | ||
"trees_initial = 15\n", | ||
"trees_after = 21\n", | ||
"answer = trees_initial - trees_after\n", | ||
"```\n", | ||
"]\n", | ||
"Observation 1: \n", | ||
"```python\n", | ||
"trees_initial = 15\n", | ||
"trees_after = 21\n", | ||
"answer = trees_initial - trees_after\n", | ||
"```\n", | ||
"Execution Status: Done\n", | ||
"Output: answer = -6\n", | ||
"Thought 2: The answer should be positive because trees are being added, not removed. I need to reverse the subtraction.\n", | ||
"Action 2: Calculate[\n", | ||
"```python\n", | ||
"trees_initial = 15\n", | ||
"trees_after = 21\n", | ||
"answer = trees_after - trees_initial\n", | ||
"```\n", | ||
"]\n", | ||
"Observation 2: \n", | ||
"```python\n", | ||
"trees_initial = 15\n", | ||
"trees_after = 21\n", | ||
"answer = trees_after - trees_initial\n", | ||
"```\n", | ||
"Execution Status: Done\n", | ||
"Output: answer = 6\n", | ||
"Thought 3: The workers planted 6 trees today.\n", | ||
"Action 3: Finish[\n", | ||
"```python\n", | ||
"answer = 6\n", | ||
"```\n", | ||
"]\n", | ||
"Observation 3: \n", | ||
"```python\n", | ||
"answer = 6\n", | ||
"```\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in example calculation.
There is a typo in the example calculation for trees. The correct calculation should be trees_after - trees_initial
.
- answer = trees_initial - trees_after
+ answer = trees_after - trees_initial
Committable suggestion
βΌοΈ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<PROMPT AGENT========================================================================>\n", | |
"Answer a math question with interleaving Thought, Action, Observation steps. Thought can reason about the current question and plan the retrieval steps, and Action can be two types:\n", | |
"(1) Calculate[code], which implements code to answer the math question, saving the answer as the `answer` variable.\n", | |
"(2) Finish[code], which returns the code to answer the math question and finishes the task, saving the answer as the `answer` variable.\n", | |
"You have a maximum of 5 steps.\n", | |
"\n", | |
"Here are some examples:\n", | |
"Question: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\n", | |
"Thought 1: First, I need to find out how many lollipops Jason gave to Denny.\n", | |
"Action 1: Calculate[\n", | |
"```python\n", | |
"jason_lollipops_initial = 20\n", | |
"jason_lollipops_after = 12\n", | |
"answer = jason_lollipops_after - jason_lollipops_initial\n", | |
"```\n", | |
"]\n", | |
"Observation 1: \n", | |
"```python\n", | |
"jason_lollipops_initial = 20\n", | |
"jason_lollipops_after = 12\n", | |
"answer = jason_lollipops_after - jason_lollipops_initial\n", | |
"```\n", | |
"Execution Status: Done\n", | |
"Output: answer = -8\n", | |
"Thought 2: The answer should be positive because Jason gave lollipops to Denny. I need to reverse the subtraction.\n", | |
"Action 2: Calculate[\n", | |
"```python\n", | |
"jason_lollipops_initial = 20\n", | |
"jason_lollipops_after = 12\n", | |
"answer = jason_lollipops_initial - jason_lollipops_after\n", | |
"```\n", | |
"]\n", | |
"Observation 2: \n", | |
"```python\n", | |
"jason_lollipops_initial = 20\n", | |
"jason_lollipops_after = 12\n", | |
"answer = jason_lollipops_initial - jason_lollipops_after\n", | |
"```\n", | |
"Execution Status: Done\n", | |
"Output: answer = 8\n", | |
"Thought 3: Jason gave 8 lollipops to Denny.\n", | |
"Action 3: Finish[\n", | |
"```python\n", | |
"answer = 8\n", | |
"```\n", | |
"]\n", | |
"Observation 3: \n", | |
"```python\n", | |
"answer = 8\n", | |
"```\n", | |
"\n", | |
"---\n", | |
"\n", | |
"Question: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?\n", | |
"Thought 1: I need to determine how many trees were planted.\n", | |
"Action 1: Calculate[\n", | |
"```python\n", | |
"trees_initial = 15\n", | |
"trees_after = 21\n", | |
"answer = trees_initial - trees_after\n", | |
"```\n", | |
"]\n", | |
"Observation 1: \n", | |
"```python\n", | |
"trees_initial = 15\n", | |
"trees_after = 21\n", | |
"answer = trees_initial - trees_after\n", | |
"```\n", | |
"Execution Status: Done\n", | |
"Output: answer = -6\n", | |
"Thought 2: The answer should be positive because trees are being added, not removed. I need to reverse the subtraction.\n", | |
"Action 2: Calculate[\n", | |
"```python\n", | |
"trees_initial = 15\n", | |
"trees_after = 21\n", | |
"answer = trees_after - trees_initial\n", | |
"```\n", | |
"]\n", | |
"Observation 2: \n", | |
"```python\n", | |
"trees_initial = 15\n", | |
"trees_after = 21\n", | |
"answer = trees_after - trees_initial\n", | |
"```\n", | |
"Execution Status: Done\n", | |
"Output: answer = 6\n", | |
"Thought 3: The workers planted 6 trees today.\n", | |
"Action 3: Finish[\n", | |
"```python\n", | |
"answer = 6\n", | |
"```\n", | |
"]\n", | |
"Observation 3: \n", | |
"```python\n", | |
"answer = 6\n", | |
"```\n", | |
"answer = trees_after - trees_initial\n", |
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from agential.cog.strategies.reflexion.math import (\n", | ||
" ReflexionReActMathStrategy,\n", | ||
" ReflexionCoTMathStrategy\n", | ||
")\n", | ||
"from agential.cog.prompts.agent.reflexion import (\n", | ||
" GSM8K_FEWSHOT_EXAMPLES_REFLEXION_COT_REFLECT,\n", | ||
" GSM8K_FEWSHOT_EXAMPLES_REFLEXION_REACT_REFLECT,\n", | ||
" REFLEXION_COT_INSTRUCTION_GSM8K,\n", | ||
" REFLEXION_COT_REFLECT_INSTRUCTION_GSM8K,\n", | ||
" REFLEXION_REACT_INSTRUCTION_GSM8K,\n", | ||
" REFLEXION_REACT_REFLECT_INSTRUCTION_GSM8K,\n", | ||
")\n", | ||
"from agential.cog.prompts.benchmark.gsm8k import (\n", | ||
" GSM8K_FEWSHOT_EXAMPLES_COT,\n", | ||
" GSM8K_FEWSHOT_EXAMPLES_REACT,\n", | ||
")\n", | ||
"from langchain_openai import ChatOpenAI\n", | ||
"\n", | ||
"import dotenv\n", | ||
"dotenv.load_dotenv()\n", | ||
"\n", | ||
"import warnings\n", | ||
"warnings.filterwarnings('ignore')\n", | ||
"\n", | ||
"llm = ChatOpenAI()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for dotenv loading.
Currently, the dotenv file is loaded without checking for errors. Consider adding error handling to ensure the environment variables are loaded correctly.
- dotenv.load_dotenv()
+ if not dotenv.load_dotenv():
+ warnings.warn("Could not load .env file. Make sure it exists and is properly formatted.")
Committable suggestion
βΌοΈ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from agential.cog.strategies.reflexion.math import (\n", | |
" ReflexionReActMathStrategy,\n", | |
" ReflexionCoTMathStrategy\n", | |
")\n", | |
"from agential.cog.prompts.agent.reflexion import (\n", | |
" GSM8K_FEWSHOT_EXAMPLES_REFLEXION_COT_REFLECT,\n", | |
" GSM8K_FEWSHOT_EXAMPLES_REFLEXION_REACT_REFLECT,\n", | |
" REFLEXION_COT_INSTRUCTION_GSM8K,\n", | |
" REFLEXION_COT_REFLECT_INSTRUCTION_GSM8K,\n", | |
" REFLEXION_REACT_INSTRUCTION_GSM8K,\n", | |
" REFLEXION_REACT_REFLECT_INSTRUCTION_GSM8K,\n", | |
")\n", | |
"from agential.cog.prompts.benchmark.gsm8k import (\n", | |
" GSM8K_FEWSHOT_EXAMPLES_COT,\n", | |
" GSM8K_FEWSHOT_EXAMPLES_REACT,\n", | |
")\n", | |
"from langchain_openai import ChatOpenAI\n", | |
"\n", | |
"import dotenv\n", | |
"dotenv.load_dotenv()\n", | |
"\n", | |
"import warnings\n", | |
"warnings.filterwarnings('ignore')\n", | |
"\n", | |
"llm = ChatOpenAI()" | |
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from agential.cog.strategies.reflexion.math import (\n", | |
" ReflexionReActMathStrategy,\n", | |
" ReflexionCoTMathStrategy\n", | |
")\n", | |
"from agential.cog.prompts.agent.reflexion import (\n", | |
" GSM8K_FEWSHOT_EXAMPLES_REFLEXION_COT_REFLECT,\n", | |
" GSM8K_FEWSHOT_EXAMPLES_REFLEXION_REACT_REFLECT,\n", | |
" REFLEXION_COT_INSTRUCTION_GSM8K,\n", | |
" REFLEXION_COT_REFLECT_INSTRUCTION_GSM8K,\n", | |
" REFLEXION_REACT_INSTRUCTION_GSM8K,\n", | |
" REFLEXION_REACT_REFLECT_INSTRUCTION_GSM8K,\n", | |
")\n", | |
"from agential.cog.prompts.benchmark.gsm8k import (\n", | |
" GSM8K_FEWSHOT_EXAMPLES_COT,\n", | |
" GSM8K_FEWSHOT_EXAMPLES_REACT,\n", | |
")\n", | |
"from langchain_openai import ChatOpenAI\n", | |
"\n", | |
"import dotenv\n", | |
"if not dotenv.load_dotenv():\n", | |
" warnings.warn(\"Could not load .env file. Make sure it exists and is properly formatted.\")\n", | |
"\n", | |
"import warnings\n", | |
"warnings.filterwarnings('ignore')\n", | |
"\n", | |
"llm = ChatOpenAI()" |
def test_reflexion_react_generate_action() -> None: | ||
"""Tests ReflexionReActMathStrategy generate_action.""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement the test function.
The test function test_reflexion_react_generate_action
is defined but not implemented. Consider implementing it to cover the generate_action method of the ReAct strategy.
def test_reflexion_react_generate_action() -> None:
"""Tests ReflexionReActMathStrategy generate_action."""
question = "Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with 4933828. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"
responses = [
'Finish[\n```python\neggs_laid_per_day = 16\neggs_eaten_for_breakfast = 3\neggs_used_for_muffins = 4933828\neggs_sold = eggs_laid_per_day - eggs_eaten_for_breakfast - eggs_used_for_muffins\nprice_per_egg = 2\nmoney_made_per_day = eggs_sold * price_per_egg\nanswer = money_made_per_day\n```\n]'
]
llm = FakeListChatModel(responses=responses)
strategy = ReflexionReActMathStrategy(llm=llm)
action_type, query = strategy.generate_action(
question=question,
examples=GSM8K_FEWSHOT_EXAMPLES_REACT,
reflections="",
prompt=REFLEXION_REACT_INSTRUCTION_GSM8K,
additional_keys={},
)
assert action_type == "Finish"
assert query == 'eggs_laid_per_day = 16\neggs_eaten_for_breakfast = 3\neggs_used_for_muffins = 4933828\neggs_sold = eggs_laid_per_day - eggs_eaten_for_breakfast - eggs_used_for_muffins\nprice_per_egg = 2\nmoney_made_per_day = eggs_sold * price_per_egg\nanswer = money_made_per_day'
assert strategy._finished == False
assert strategy._answer == ""
assert (
strategy._scratchpad
== '\nAction: Finish[\n```python\neggs_laid_per_day = 16\neggs_eaten_for_breakfast = 3\neggs_used_for_muffins = 4933828\neggs_sold = eggs_laid_per_day - eggs_eaten_for_breakfast - eggs_used_for_muffins\nprice_per_egg = 2\nmoney_made_per_day = eggs_sold * price_per_egg\nanswer = money_made_per_day\n```\n]'
)
def test_reflexion_react_generate_observation() -> None: | ||
"""Tests ReflexionReActMathStrategy generate_observation.""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement the test function.
The test function test_reflexion_react_generate_observation
is defined but not implemented. Consider implementing it to cover the generate_observation method of the ReAct strategy.
def test_reflexion_react_generate_observation() -> None:
"""Tests ReflexionReActMathStrategy generate_observation."""
# Case 1: action_type is "Finish" and answer is correct.
llm = FakeListChatModel(responses=[])
strategy = ReflexionReActMathStrategy(llm=llm)
is_correct, obs = strategy.generate_observation(
action_type="Finish", query="correct_answer", key="correct_answer"
)
assert is_correct == False
assert obs == "Answer is INCORRECT"
assert "Observation: Answer is INCORRECT" in strategy._scratchpad
# Case 2: action_type is "Finish" and answer is incorrect.
strategy = ReflexionReActMathStrategy(llm=llm)
is_correct, obs = strategy.generate_observation(
action_type="Finish", query="incorrect_answer", key="correct_answer"
)
assert is_correct == False
assert obs == "Answer is INCORRECT"
assert "Observation: Answer is INCORRECT" in strategy._scratchpad
# Case 3: action_type is not "Finish".
strategy = ReflexionReActMathStrategy(llm=llm)
is_correct, obs = strategy.generate_observation(
action_type="Calculate", query="some_query", key="correct_answer"
)
assert is_correct == False
assert obs == "Invalid action type, please try again."
assert "Observation: Invalid action type, please try again." in strategy._scratchpad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- agential/cog/functional/reflexion.py (4 hunks)
- tests/cog/strategies/reflexion/test_math.py (1 hunks)
- tests/cog/strategies/reflexion/test_qa.py (11 hunks)
- tests/cog/strategies/reflexion/tmp.ipynb (1 hunks)
Files skipped from review as they are similar to previous changes (4)
- agential/cog/functional/reflexion.py
- tests/cog/strategies/reflexion/test_math.py
- tests/cog/strategies/reflexion/test_qa.py
- tests/cog/strategies/reflexion/tmp.ipynb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- agential/cog/strategies/critic/base.py (1 hunks)
- agential/cog/strategies/critic/code.py (3 hunks)
- agential/cog/strategies/critic/math.py (3 hunks)
- docs/docs/index.md (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- agential/cog/strategies/critic/base.py
- agential/cog/strategies/critic/code.py
- agential/cog/strategies/critic/math.py
Additional context used
LanguageTool
docs/docs/index.md
[uncategorized] ~90-~90: Possible missing comma found.
Context: ... | | | For full documentation visit [mkdocs.org](https://www.mkdocs.o...(AI_HYDRA_LEO_MISSING_COMMA)
Markdownlint
docs/docs/index.md
94-94: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
95-95: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
96-96: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
97-97: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
22-22: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
25-25: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
26-26: Expected: 1; Actual: 3
Multiple consecutive blank lines(MD012, no-multiple-blanks)
61-61: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
23-23: Punctuation: '...'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
59-59: Punctuation: '...'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
Additional comments not posted (4)
docs/docs/index.md (4)
41-41
: LGTM!The Reflexion method has been appropriately marked as tested for the GSM8k, SVAMP, and TabMWP benchmarks.
Line range hint
22-26
: Remove multiple consecutive blank lines.There are multiple consecutive blank lines that should be removed.
- - - - - +Also applies to: 61-61
Line range hint
23-23
: Remove trailing punctuation in heading.The trailing punctuation in the headings should be removed.
- ## Implementing... - ## Experimenting... + ## Implementing + ## ExperimentingAlso applies to: 59-59
Line range hint
94-97
: Use dash for unordered list style.The unordered list style should use dashes instead of asterisks.
- * `mkdocs new [dir-name]` - Create a new project. - * `mkdocs serve` - Start the live-reloading docs server. - * `mkdocs build` - Build the documentation site. - * `mkdocs -h` - Print help message and exit. + - `mkdocs new [dir-name]` - Create a new project. + - `mkdocs serve` - Start the live-reloading docs server. + - `mkdocs build` - Build the documentation site. + - `mkdocs -h` - Print help message and exit.
π€ Reasoning
Explain the purpose of this PR...
π§ Changes
Describe the changes made...
β PR Checklist
Summary by CodeRabbit
New Features
ReflexionCoTMathStrategy
andReflexionReActMathStrategy
.COT
for GSM8K and SVAMP benchmarks.Bug Fixes
Enhancements
Documentation
Tests
Notebooks
notebooks/reflexion.ipynb
to include parameters forReflexionCoTReflector
andReflexionReActReflector
.Chores