Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize Output Dictionaries, Mock API Calls, Math (GSM8K, SVAMP, TabMWP) for Reflexion #186

Merged
merged 125 commits into from
Jun 29, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
c36d52d
reflexion cot/react math strat skeleton
alckasoc Jun 23, 2024
5ac46bc
.
alckasoc Jun 23, 2024
e83b76f
added gsm8k fewshot cot
alckasoc Jun 23, 2024
a2403d2
.
alckasoc Jun 24, 2024
d9c7b4e
react math fix
alckasoc Jun 24, 2024
e9f72b7
react code strat fix
alckasoc Jun 24, 2024
65cfaa9
adding eext tool innfo
alckasoc Jun 24, 2024
61eb219
edit qa react test
alckasoc Jun 24, 2024
240b752
pass test gen obs
alckasoc Jun 24, 2024
b14656d
fix math
alckasoc Jun 24, 2024
aba4a0c
fixed math
alckasoc Jun 24, 2024
e67a904
auto linted
alckasoc Jun 24, 2024
b0e8435
code fix
alckasoc Jun 24, 2024
ee6791e
code test gen obs react
alckasoc Jun 24, 2024
bf56ed5
.
alckasoc Jun 24, 2024
21928d0
fix critic qa
alckasoc Jun 24, 2024
9b51896
.
alckasoc Jun 24, 2024
f6e2266
fix
alckasoc Jun 24, 2024
8118bd4
auto linted
alckasoc Jun 24, 2024
f11c340
fix
alckasoc Jun 24, 2024
4f4e190
auto linted
alckasoc Jun 24, 2024
a63d457
reflexioncot fix
alckasoc Jun 24, 2024
cf18a73
.
alckasoc Jun 24, 2024
b31cef1
.
alckasoc Jun 24, 2024
11b4ead
fix
alckasoc Jun 24, 2024
8617976
auto linted
alckasoc Jun 24, 2024
3a5f5a2
fix
alckasoc Jun 24, 2024
6c89fe8
fix
alckasoc Jun 24, 2024
3d071fb
fix
alckasoc Jun 24, 2024
2a446e6
react mocked
alckasoc Jun 24, 2024
b065f71
.
alckasoc Jun 24, 2024
01642ab
mock react test gen
alckasoc Jun 24, 2024
1896495
mock critic
alckasoc Jun 24, 2024
e452e4f
.
alckasoc Jun 24, 2024
169c48d
ok
alckasoc Jun 24, 2024
1b4add8
1
alckasoc Jun 24, 2024
88ada46
auto linted
alckasoc Jun 24, 2024
962853e
.
alckasoc Jun 24, 2024
5f2007b
auto linted
alckasoc Jun 24, 2024
b8022f7
fix
alckasoc Jun 25, 2024
c12a044
critic math fix
alckasoc Jun 25, 2024
b33b011
critic code fix
alckasoc Jun 25, 2024
2722af0
.
alckasoc Jun 25, 2024
dd51d62
.
alckasoc Jun 25, 2024
95ad673
halt and reset
alckasoc Jun 25, 2024
7569ed7
.
alckasoc Jun 25, 2024
e6f1164
.
alckasoc Jun 25, 2024
a43603d
reflexion cot exmaples
alckasoc Jun 25, 2024
afd9bd8
time to test
alckasoc Jun 25, 2024
a9ffd35
.
alckasoc Jun 25, 2024
8ddb0fe
.
alckasoc Jun 25, 2024
caed1f2
FX
alckasoc Jun 25, 2024
0b52ced
em
alckasoc Jun 25, 2024
eeb3c06
.
alckasoc Jun 25, 2024
23e042f
FIRST TEST WORKS; gsm8k reactreflexion; tabmwp; svvamp left
alckasoc Jun 25, 2024
23b493d
.
alckasoc Jun 26, 2024
4d2e45f
WIP math reflexionreact
alckasoc Jun 26, 2024
eea6557
fix
alckasoc Jun 26, 2024
16ec73a
.
alckasoc Jun 27, 2024
7715bf4
.
alckasoc Jun 27, 2024
5c98148
fix
alckasoc Jun 27, 2024
8ff34fa
fix
alckasoc Jun 27, 2024
7bb55f4
.
alckasoc Jun 27, 2024
4c6eb58
.
alckasoc Jun 27, 2024
42b08fe
.
alckasoc Jun 27, 2024
ab66b8d
les go
alckasoc Jun 27, 2024
c12be90
.
alckasoc Jun 27, 2024
4f37768
docs
alckasoc Jun 27, 2024
3050dc3
auto lint
alckasoc Jun 27, 2024
d15a6e7
ready for testing
alckasoc Jun 27, 2024
987ea48
fix em
alckasoc Jun 28, 2024
67c5581
.
alckasoc Jun 28, 2024
a87a61e
add fewshots
alckasoc Jun 28, 2024
220dd2c
max tokens to 5k
alckasoc Jun 28, 2024
68a5cd9
2 fixes
alckasoc Jun 28, 2024
b0cdf1c
,
alckasoc Jun 28, 2024
ea830eb
.
alckasoc Jun 28, 2024
62983b7
done with gsm8kk; working on svamp
alckasoc Jun 28, 2024
c1c61c7
SVAMP_FEWSHOT_EXAMPLES_COT
alckasoc Jun 28, 2024
d065863
fewhots cot reflect
alckasoc Jun 28, 2024
536960d
.
alckasoc Jun 29, 2024
fa7d03c
tabmwp init
alckasoc Jun 29, 2024
7a052c5
.
alckasoc Jun 29, 2024
5e4b868
tabmwp cot
alckasoc Jun 29, 2024
4a88be8
okk
alckasoc Jun 29, 2024
7111877
.
alckasoc Jun 29, 2024
12ddd37
.
alckasoc Jun 29, 2024
5618a0e
tabmwp instructions
alckasoc Jun 29, 2024
590b5c4
reflexion cot reflect
alckasoc Jun 29, 2024
c86db7d
ok
alckasoc Jun 29, 2024
cf6527b
IT RUNS!
alckasoc Jun 29, 2024
44ee60e
clear outputs
alckasoc Jun 29, 2024
8bbb1f0
auto linted
alckasoc Jun 29, 2024
429a448
remove prints
alckasoc Jun 29, 2024
2804490
docs
alckasoc Jun 29, 2024
1e610b6
init
alckasoc Jun 29, 2024
6307c12
1 done
alckasoc Jun 29, 2024
6c898a3
2 done
alckasoc Jun 29, 2024
1217fd8
lint
alckasoc Jun 29, 2024
d3e8328
4 done
alckasoc Jun 29, 2024
7c23e4d
.
alckasoc Jun 29, 2024
088daff
.
alckasoc Jun 29, 2024
d47922c
2 more done
alckasoc Jun 29, 2024
c9009b2
.
alckasoc Jun 29, 2024
4e6fb4a
1 more down
alckasoc Jun 29, 2024
dba9e0b
anotha one!
alckasoc Jun 29, 2024
9a42b66
.
alckasoc Jun 29, 2024
6b53af1
anotha oneee
alckasoc Jun 29, 2024
a71e135
anotha one
alckasoc Jun 29, 2024
d07675a
ok
alckasoc Jun 29, 2024
d2cda6b
ok anotha one
alckasoc Jun 29, 2024
9f0cabc
2 more done
alckasoc Jun 29, 2024
9b3c8de
1 more down
alckasoc Jun 29, 2024
9a176a4
ok
alckasoc Jun 29, 2024
62a78ba
yay 1 more
alckasoc Jun 29, 2024
71d38bd
2/3
alckasoc Jun 29, 2024
019a6f7
1 down
alckasoc Jun 29, 2024
c761033
2 more down
alckasoc Jun 29, 2024
e6de881
1
alckasoc Jun 29, 2024
afe7452
al
alckasoc Jun 29, 2024
1b4b599
2 more
alckasoc Jun 29, 2024
d23628f
1 mmore down
alckasoc Jun 29, 2024
f8bb8b0
all done LES GO
alckasoc Jun 29, 2024
5d64b74
.
alckasoc Jun 29, 2024
c38a6fc
del
alckasoc Jun 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions agential/cog/agent/react.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,13 +96,17 @@ def generate(
)

# Observe.
obs = self.strategy.generate_observation(
obs, external_tool_info = self.strategy.generate_observation(
idx=idx, action_type=action_type, query=query
)

out.append(
self.strategy.create_output_dict(
thought=thought, action_type=action_type, query=query, obs=obs
thought=thought,
action_type=action_type,
query=query,
obs=obs,
external_tool_info=external_tool_info,
)
)

Expand Down
28 changes: 28 additions & 0 deletions agential/cog/functional/reflexion.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,13 +178,27 @@ def _prompt_cot_agent(
prompt=prompt,
additional_keys=additional_keys,
)
print(
"<PROMPT AGENT========================================================================>"
)
print(prompt)
print(
"<PROMPT AGENT========================================================================>"
)
out = llm(
[
HumanMessage(
content=prompt,
)
]
).content
print(
"<OUT AGENT========================================================================>"
)
print(repr(out))
print(
"<OUT AGENT========================================================================>"
)
alckasoc marked this conversation as resolved.
Show resolved Hide resolved
assert isinstance(out, str)
return out

Expand Down Expand Up @@ -252,13 +266,27 @@ def _prompt_cot_reflection(
prompt=prompt,
additional_keys=additional_keys,
)
print(
"<PROMPT REFLECT========================================================================>"
)
print(prompt)
print(
"<PROMPT REFLECT========================================================================>"
)
out = llm(
[
HumanMessage(
content=prompt,
)
]
).content
print(
"<OUT REFLECT========================================================================>"
)
print(repr(out))
print(
"<OUT REFLECT========================================================================>"
)
alckasoc marked this conversation as resolved.
Show resolved Hide resolved
assert isinstance(out, str)
return out

Expand Down
9 changes: 9 additions & 0 deletions agential/cog/prompts/agent/reflexion.py
Original file line number Diff line number Diff line change
Expand Up @@ -602,3 +602,12 @@
Action 2: Finish[Lindsey Vonn]

Reflection: My reasoning failed because I doubted the clear evidence provided by the source and made an incorrect assumption based on the fame of another skier. In the future, I should rely on the provided evidence rather than making unsupported assumptions."""


# ======================================================================== GSM8K ======================================================================== #


# ======================================================================== SVAMP ======================================================================== #


# ======================================================================== TABMWP ======================================================================== #
98 changes: 98 additions & 0 deletions agential/cog/prompts/benchmark/gsm8k.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,104 @@
answer = chocolates_left"""


GSM8K_FEWSHOT_EXAMPLES_COT = """Question: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?
Thought: Let's think step by step. Jason had 20 lollipops initially and now he has 12 lollipops. So, he must have given 20 - 12 = 8 lollipops to Denny.
Action: Finish[
```python
jason_lollipops_initial = 20
jason_lollipops_after = 12
denny_lollipops = jason_lollipops_initial - jason_lollipops_after
answer = denny_lollipops
```
]

Question: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?
Thought: Let's think step by step. There were initially 15 trees and after planting, there are 21 trees. So, the number of trees planted is 21 - 15 = 6.
Action: Finish[
```python
trees_initial = 15
trees_after = 21
trees_added = trees_after - trees_initial
answer = trees_added
```
]

Question: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?
Thought: Let's think step by step. Shawn had 5 toys initially, and he got 2 toys from his mom and 2 toys from his dad. So, the total number of toys he has now is 5 + 2 + 2 = 9.
Action: Finish[
```python
toys_initial = 5
mom_toys = 2
dad_toys = 2
total_received = mom_toys + dad_toys
total_toys = toys_initial + total_received
answer = total_toys
```
]

Question: There were nine computers in the server room. Five more computers were installed each day, from Monday to Thursday. How many computers are now in the server room?
Thought: Let's think step by step. Initially, there were 9 computers. Five more computers were added each day from Monday to Thursday, which is 4 days. So, the total number of computers added is 5 * 4 = 20. Therefore, the total number of computers now is 9 + 20 = 29.
Action: Finish[
```python
computers_initial = 9
computers_per_day = 5
num_days = 4 # 4 days between Monday and Thursday
computers_added = computers_per_day * num_days
computers_total = computers_initial + computers_added
answer = computers_total
```
]

Question: Michael had 58 golf balls. On Tuesday, he lost 23 golf balls. On Wednesday, he lost 2 more. How many golf balls did he have at the end of Wednesday?
Thought: Let's think step by step. Michael initially had 58 golf balls. He lost 23 golf balls on Tuesday and 2 more on Wednesday. So, the total number of golf balls lost is 23 + 2 = 25. Therefore, the number of golf balls he has left is 58 - 25 = 33.
Action: Finish[
```python
golf_balls_initial = 58
golf_balls_lost_tuesday = 23
golf_balls_lost_wednesday = 2
golf_balls_left = golf_balls_initial - golf_balls_lost_tuesday - golf_balls_lost_wednesday
answer = golf_balls_left
```
]

Question: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
Thought: Let's think step by step. Initially, there are 3 cars in the parking lot. When 2 more cars arrive, the total number of cars is 3 + 2 = 5.
Action: Finish[
```python
cars_initial = 3
cars_arrived = 2
total_cars = cars_initial + cars_arrived
answer = total_cars
```
]

Question: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
Thought: Let's think step by step. Olivia had $23 initially. She bought 5 bagels for $3 each. The total cost of the bagels is 5 * 3 = $15. So, the money she has left is 23 - 15 = $8.
Action: Finish[
```python
money_initial = 23
bagels = 5
bagel_cost = 3
money_spent = bagels * bagel_cost
money_left = money_initial - money_spent
answer = money_left
```
]

Question: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
Thought: Let's think step by step. Leah had 32 chocolates and her sister had 42 chocolates. Together they had 32 + 42 = 74 chocolates. They ate 35 chocolates, so the number of chocolates left is 74 - 35 = 39.
Action: Finish[
```python
leah_chocolates = 32
sister_chocolates = 42
total_chocolates = leah_chocolates + sister_chocolates
chocolates_eaten = 35
chocolates_left = total_chocolates - chocolates_eaten
answer = chocolates_left
```
]"""


GSM8K_FEWSHOT_EXAMPLES_REACT = """Question: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?
Thought 1: First, I need to find out how many lollipops Jason gave to Denny.
Action 1: Calculate[
Expand Down
18 changes: 13 additions & 5 deletions agential/cog/strategies/react/base.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Base ReAct Agent strategy class."""

from abc import abstractmethod
from typing import Dict, Tuple
from typing import Any, Dict, Tuple

from langchain_core.language_models.chat_models import BaseChatModel

Expand Down Expand Up @@ -41,7 +41,9 @@ def generate_action(
pass

@abstractmethod
def generate_observation(self, idx: int, action_type: str, query: str) -> str:
def generate_observation(
self, idx: int, action_type: str, query: str
) -> Tuple[str, Dict[str, Any]]:
"""Generates an observation based on the action type and query.

Args:
Expand All @@ -50,13 +52,18 @@ def generate_observation(self, idx: int, action_type: str, query: str) -> str:
query (str): The query for the action.

Returns:
str: The generated observation.
Tuple[str, Dict[str, Any]]: The generated observation and external tool outputs.
"""
pass

@abstractmethod
def create_output_dict(
self, thought: str, action_type: str, query: str, obs: str
self,
thought: str,
action_type: str,
query: str,
obs: str,
external_tool_info: Dict[str, Any],
) -> Dict[str, str]:
"""Creates a dictionary of the output components.

Expand All @@ -65,9 +72,10 @@ def create_output_dict(
action_type (str): The type of action performed.
query (str): The query for the action.
obs (str): The generated observation.
external_tool_info (Dict[str, Any]): The external tool outputs.

Returns:
Dict[str, str]: A dictionary containing the thought, action type, query, and observation.
Dict[str, Any]: A dictionary containing the thought, action type, query, observation, answer, and external tool output.
"""
pass

Expand Down
46 changes: 32 additions & 14 deletions agential/cog/strategies/react/code.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def __init__(
self.enc = enc

self._scratchpad = ""
self._current_answer = ""
self._answer = ""
self._finished = False

def generate(
Expand Down Expand Up @@ -148,7 +148,9 @@ def generate_action(

return action_type, query

def generate_observation(self, idx: int, action_type: str, query: str) -> str:
def generate_observation(
self, idx: int, action_type: str, query: str
) -> Tuple[str, Dict[str, Any]]:
"""Generates an observation based on the action type and query.

Args:
Expand All @@ -157,47 +159,63 @@ def generate_observation(self, idx: int, action_type: str, query: str) -> str:
query (str): The query for the action.

Returns:
str: The generated observation.
Tuple[str, Dict[str, Any]]: The generated observation and external tool outputs.
"""
external_tool_info = {"execution_status": ""}

self._scratchpad += f"\nObservation {idx}: "
if action_type.lower() == "finish":
self._current_answer = query
_, execution_status = safe_execute(query)
external_tool_info["execution_status"] = execution_status

self._answer = query
self._finished = True
obs = f"\n```python\n{self._current_answer}\n```"
obs = f"\n```python\n{self._answer}\n```"
elif action_type.lower() == "implement":
_, execution_status = safe_execute(query)
self._current_answer = query
obs = f"\n```python\n{self._current_answer}\n```\nExecution Status: {execution_status}"
external_tool_info["execution_status"] = execution_status

self._answer = query
obs = f"\n```python\n{self._answer}\n```\nExecution Status: {execution_status}"
elif action_type.lower() == "test":
obs = f"{self._current_answer}\n\n{query}"
obs = f"{self._answer}\n\n{query}"
_, execution_status = safe_execute(obs)
external_tool_info["execution_status"] = execution_status

obs = f"\n```python\n{obs}\n```\nExecution Status: {execution_status}"
else:
obs = "Invalid Action. Valid Actions are Implement[code] Test[code] and Finish[answer]."
self._scratchpad += obs

return obs
return obs, external_tool_info

def create_output_dict(
self, thought: str, action_type: str, query: str, obs: str
) -> Dict[str, str]:
self,
thought: str,
action_type: str,
query: str,
obs: str,
external_tool_info: Dict[str, Any],
) -> Dict[str, Any]:
"""Creates a dictionary of the output components.

Args:
thought (str): The generated thought.
action_type (str): The type of action performed.
query (str): The query for the action.
obs (str): The generated observation.
external_tool_info (Dict[str, Any]): The external tool outputs.

Returns:
Dict[str, str]: A dictionary containing the thought, action type, query, observation, and answer.
Dict[str, Any]: A dictionary containing the thought, action type, query, observation, answer, and external tool output.
"""
return {
"thought": thought,
"action_type": action_type,
"query": query,
"observation": obs,
"answer": self._current_answer,
"answer": self._answer,
"external_tool_info": external_tool_info,
}

def halting_condition(
Expand Down Expand Up @@ -248,7 +266,7 @@ def reset(self, **kwargs: Any) -> None:
Returns:
None
"""
self._current_answer = ""
self._answer = ""
self._scratchpad = ""
self._finished = False

Expand Down
Loading
Loading