Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OFV market resolver #225

Merged
merged 111 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
4791548
feat: add keychain
0xArdi Apr 24, 2024
0cf7a8a
Make tomte a dev dependency
evangriffiths Apr 29, 2024
2cff00a
Add OFV market resolver
kongzii May 7, 2024
d6e8353
fix run() interface
richardblythman May 8, 2024
0b4694f
chore: remove redundant package
angrybayblade May 10, 2024
a3b87ab
gemini request tool
victorpolisetty May 12, 2024
9011c96
cleaned up testing prompts
victorpolisetty May 12, 2024
36f2386
Fix for `run` signature changes
kongzii May 13, 2024
3237dcf
fix types
kongzii May 13, 2024
4789d18
fix return value
kongzii May 13, 2024
f7d6255
fix wrong version of secretstr
kongzii May 13, 2024
5d01be1
another fixes
kongzii May 13, 2024
5db1b4d
eval each question 3x
kongzii May 15, 2024
3473cb2
remove old comment
kongzii May 15, 2024
81e8e57
ipfs updates
victorpolisetty May 15, 2024
010f2e9
add kwargs
kongzii May 16, 2024
d1a3d59
update error handling for response.text
victorpolisetty May 17, 2024
bc15607
Merge branch 'refs/heads/main' into feat/keychain
0xArdi May 21, 2024
eecbf34
feat: update tools to use KeyChain
0xArdi May 21, 2024
3a08009
feat: add max retries
0xArdi May 21, 2024
9715316
fix: `prediction_request_rag` tool resposne
0xArdi May 22, 2024
c0dee63
chore: lint
0xArdi May 22, 2024
726de11
fix: deps
0xArdi May 22, 2024
9e0ee08
fix: add default model support
0xArdi May 22, 2024
22f098f
Merge pull request #227 from valory-xyz/fix/remove-redundant-package
0xArdi May 22, 2024
f0366ca
Merge pull request #222 from evangriffiths/evan/tomte-dev-dep
0xArdi May 22, 2024
10c7d3e
Add tool deps README.md
0xArdi May 23, 2024
5576d16
Merge pull request #230 from valory-xyz/docs/deps
0xArdi May 23, 2024
524bc59
chore: bump deps
dvilelaf May 24, 2024
c27dfb6
Merge branch 'refs/heads/main' into feat/keychain
0xArdi May 24, 2024
076f6d6
chore: lock poetry
0xArdi May 24, 2024
1a27e3f
Merge pull request #229 from valory-xyz/feat/keychain
0xArdi May 24, 2024
bb77f35
chore: bump deps again
dvilelaf May 24, 2024
6be4bbf
chore: lock packages
dvilelaf May 24, 2024
ee4762f
feat: add tool
dvilelaf May 24, 2024
9523de1
fix: improve response, rename
dvilelaf May 24, 2024
e181b8b
fix: hashes
dvilelaf May 24, 2024
5ae1f76
Merge branch 'main' into feat/langchain-tool
dvilelaf May 24, 2024
a928106
fix: ignore false positive leaks
dvilelaf May 24, 2024
c76e9b5
fix: safety
dvilelaf May 24, 2024
1a2ca30
fix: add key rotation
dvilelaf May 24, 2024
5edf527
fix: keychain
dvilelaf May 24, 2024
2677c6a
Merge branch 'main' into gemini-tool
victorpolisetty May 25, 2024
90bfc5d
fix merge conflicts
victorpolisetty May 25, 2024
2b56e8d
fix: add default model to `resolve_market_reasoning`
0xArdi May 27, 2024
861474b
Merge pull request #233 from valory-xyz/fix/reslove-market-defualt-model
0xArdi May 27, 2024
5648a6f
feat: add lite tools
0xArdi May 28, 2024
f136c3b
feat: return probabilities, confidence and info utility
Adamantios May 28, 2024
473cfc4
chore: bump mech deps
0xArdi May 28, 2024
79779db
Merge branch 'main' into peter/ofv-resolver
kongzii May 29, 2024
977df31
update lock
kongzii May 29, 2024
22730cf
add is_valid
kongzii May 29, 2024
88d1fdb
refactor: hardcode topic and timeframe
Adamantios May 29, 2024
acead1d
refactor: change the question's expected kwarg to `prompt`
Adamantios May 29, 2024
6a9ed4d
Merge remote-tracking branch 'origin/feat/langchain-tool' into feat/l…
Adamantios May 29, 2024
bd8cb49
Merge branch 'refs/heads/main' into feat/langchain-tool
Adamantios May 29, 2024
ea51470
fix: env vars
dvilelaf May 29, 2024
72adedc
fix: format hack
dvilelaf May 29, 2024
5673461
fix: more hacks
dvilelaf May 29, 2024
56ecf18
Merge pull request #231 from valory-xyz/feat/langchain-tool
dvilelaf May 29, 2024
1a825db
First skeleton for omen_buy_yes_tokens
gabrielfior May 30, 2024
d42541b
Merge branch 'main' into gemini-tool
victorpolisetty May 30, 2024
d3d52f3
Improved LLM call with Pydantic output object
gabrielfior May 31, 2024
7edecf8
Added sell token functionality
gabrielfior Jun 4, 2024
c53c5b2
Updated poetry dependencies
gabrielfior Jun 4, 2024
bec95a1
Added tests
gabrielfior Jun 4, 2024
c3d7e78
Added dependencies to more files per PR comments
gabrielfior Jun 5, 2024
a629817
Merge branch 'main' into feat/lite-tools
0xArdi Jun 6, 2024
0c972da
Merge pull request #235 from valory-xyz/feat/lite-tools
0xArdi Jun 7, 2024
77a92aa
Added gnosis_rpc_url to key chain, removed from environ
gabrielfior Jun 7, 2024
f09951f
Added GNOSIS_RPC_URL as secrets to unit tests
gabrielfior Jun 7, 2024
84b7148
Merge pull request #237 from gabrielfior/gabriel/buy-yes-tokens-omen
0xArdi Jun 7, 2024
625fab0
Merge branch 'main' into gemini-tool
victorpolisetty Jun 8, 2024
8042ba6
update deps and tests
victorpolisetty Jun 8, 2024
8ead082
DALLE request mech tool
victorpolisetty Jun 11, 2024
df8fe44
Merge branch 'main' into peter/ofv-resolver
kongzii Jun 17, 2024
10fbda7
Merge pull request #228 from victorpolisetty/gemini-tool
0xArdi Jun 17, 2024
16093a1
merge master into branch
victorpolisetty Jun 22, 2024
35ed92b
add unit test and key rotation
victorpolisetty Jun 23, 2024
1f5d271
ran packages lock
victorpolisetty Jun 23, 2024
1f9ef4d
Bumped PMAT version
gabrielfior Jun 25, 2024
d947725
Updated poetry.lock
gabrielfior Jun 25, 2024
3c90b94
Relaxed dependencies
gabrielfior Jun 25, 2024
411f43a
update ofv rev
kongzii Jun 27, 2024
fa4dfc6
fix deps
kongzii Jun 27, 2024
1a2e216
Merge branch 'main' into peter/ofv-resolver
kongzii Jun 27, 2024
c522f9f
fix lock file
kongzii Jun 27, 2024
c8acbe9
fix benchmark for single prediction
kongzii Jun 27, 2024
f647200
Merge pull request #239 from victorpolisetty/dalle-request
0xArdi Jul 2, 2024
10b558a
Merge branch 'main' into peter/ofv-resolver
kongzii Jul 3, 2024
18d3b3c
Merge pull request #241 from gabrielfior/fix-gnosis-pmat-version
0xArdi Jul 3, 2024
bafb091
Add -i 70612
kongzii Jul 3, 2024
7d5f212
Merge branch 'main' into peter/ofv-resolver
kongzii Jul 3, 2024
f6cee21
update lock
kongzii Jul 3, 2024
bf747b3
force cpu torch
kongzii Jul 5, 2024
42a3ab7
Update poetry version
kongzii Jul 8, 2024
3498fe7
fix mack
kongzii Jul 9, 2024
efb8709
fix gitleaks ignore
kongzii Jul 9, 2024
9c64692
trying to fix tools
kongzii Jul 9, 2024
1be3d41
revert cpu only torch
kongzii Jul 9, 2024
760d126
relock
kongzii Jul 9, 2024
f2481ad
try to fix tox
kongzii Jul 9, 2024
3e5f5b7
maybe maybe!
kongzii Jul 9, 2024
2facdcc
fix aeaconfig
kongzii Jul 9, 2024
4f7fd61
update locks
kongzii Jul 9, 2024
3a5678a
Move OFV to gnosis folder instead of kongzii
kongzii Jul 9, 2024
df9df8e
add graphapi keyy
kongzii Jul 9, 2024
4ba88c0
black
kongzii Jul 9, 2024
cab249d
remove sentene transformers
kongzii Jul 9, 2024
4f78c3d
never enough of fixes!
kongzii Jul 9, 2024
5db5ec0
lock
kongzii Jul 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,6 @@ backup_mech/
/packages/valory/skills/termination_abci/
/pip
/tool_test.py
.venv
.venv
log
.benchmark-cache
149 changes: 149 additions & 0 deletions packages/kongzii/customs/ofv_market_resolver/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
import typer
import json
import pandas as pd
from packages.kongzii.customs.ofv_market_resolver.ofv_market_resolver import (
run as ofv_run,
)
from packages.napthaai.customs.resolve_market_reasoning.resolve_market_reasoning import (
Results,
run as original_run,
)
from pydantic import SecretStr, ValidationError
from joblib import Memory

# File cache to not re-run the same questions.
MEMORY = Memory(".benchmark-cache", verbose=0)
APP = typer.Typer()


@MEMORY.cache
def ofv_run_cached(
question: str,
openai_api_key: SecretStr,
serper_api_key: SecretStr,
) -> bool | None:
return json.loads(
ofv_run(
prompt=question,
api_keys={
"openai": openai_api_key.get_secret_value(),
"serperapi": serper_api_key.get_secret_value(),
},
)[0]
)["has_occurred"]


@MEMORY.cache
def run_original_resolver_cached(
question: str,
openai_api_key: SecretStr,
google_api_key: SecretStr,
google_engine_id: SecretStr,
) -> bool | None:
try:
dump = original_run(
api_keys={
"openai": openai_api_key.get_secret_value(),
"google_api_key": google_api_key.get_secret_value(),
"google_engine_id": google_engine_id.get_secret_value(),
},
tool="resolve-market-reasoning-gpt-4",
prompt=question,
)[0]
return Results.model_validate_json(dump).has_occurred
except ValueError:
return None


@APP.command()
def full(
data_path: str,
openai_api_key: str,
serper_api_key: str,
google_api_key: str,
google_engine_id: str,
) -> None:
"""
Will run the prediction market resolver on all provided data and compare the results.

Expects a tsv file with columns:
- question
- resolution (YES/NO, as currently resolved on Omen)
- my_resolution (YES/NO, as resolved manually by you, used as ground truth)

Example command:

```
python packages/kongzii/customs/ofv_market_resolver/benchmark.py full markets.tsv {openai api key} {serper api key} {google api key} {google engine id}
```
"""
df = pd.read_csv(data_path, sep="\t")

# Run the resolution on all the data.
df["ofv_resolution"] = df["question"].apply(
lambda q: ofv_run_cached(
q,
openai_api_key=SecretStr(openai_api_key),
serper_api_key=SecretStr(serper_api_key),
)
)
df["new_original_resolution"] = df["question"].apply(
lambda q: run_original_resolver_cached(
q,
openai_api_key=SecretStr(openai_api_key),
google_api_key=SecretStr(google_api_key),
google_engine_id=SecretStr(google_engine_id),
)
)
# Normalise boolean to YES/NO/None.
df["ofv_resolution"] = df["ofv_resolution"].apply(
lambda r: "None" if r is None else "YES" if r else "NO"
)
df["new_original_resolution"] = df["new_original_resolution"].apply(
lambda r: "None" if r is None else "YES" if r else "NO"
)
# Save all the predictions and separatelly these that are incorrect.
df.to_csv("markets_resolved.tsv", sep="\t", index=False)
df[df["ofv_resolution"] != df["my_resolution"]].to_csv(
"markets_resolved_incorretly_by_ofv.tsv", sep="\t", index=False
)

# Calculate the accuracy.
accuracy_current = sum(df["resolution"] == df["my_resolution"]) / len(df)
accuracy_new_original = sum(
df["new_original_resolution"] == df["my_resolution"]
) / len(df)
accuracy_ofv = sum(df["ofv_resolution"] == df["my_resolution"]) / len(df)
print(
f"""
Current accuracy: {accuracy_current*100:.2f}%
Original's new run accuracy: {accuracy_new_original * 100:.2f}
OFV's accuracy: {accuracy_ofv*100:.2f}%
"""
)


@APP.command()
def single(
question: str,
openai_api_key: str,
serper_api_key: str,
) -> None:
"""
Will run the prediction market resolver and print the result on a single question.

Example command:

```
python packages/kongzii/customs/ofv_market_resolver/benchmark.py single "Will McDonald's successfully buy back all its Israeli restaurants by 12 April 2024?" {openai api key} {serper api key}
```
"""
ofv_run(
question,
openai_api_key=SecretStr(openai_api_key),
serper_api_key=SecretStr(serper_api_key),
)


if __name__ == "__main__":
APP()
223 changes: 223 additions & 0 deletions packages/kongzii/customs/ofv_market_resolver/ofv_market_resolver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
from factcheck import FactCheck
from factcheck.utils.multimodal import modal_normalization
import json
import typing as t
from langchain_openai import ChatOpenAI
from typing import Annotated, Any, Dict, Optional, Tuple
from pydantic import BaseModel, BeforeValidator

DEFAULT_OPENAI_MODEL = "gpt-4-0125-preview"

Factuality = Annotated[
bool | None,
BeforeValidator(lambda v: None if v in ("Nothing to check.", "non-factual") else v),
]


class FactCheckClaimDetails(BaseModel):
claim: str
factuality: Factuality
correction: str | None
reference_url: str


class FactCheckResult(BaseModel):
factuality: Factuality
claims_details: list[FactCheckClaimDetails] | None


def factcheck(
statement: str,
model: str = DEFAULT_OPENAI_MODEL,
openai_api_key: str | None = None,
serper_api_key: str | None = None,
) -> FactCheckResult:
api_config = {
"OPENAI_API_KEY": openai_api_key,
"SERPER_API_KEY": serper_api_key,
}
factcheck = FactCheck(
default_model=model,
api_config=api_config,
retriever="serper",
num_seed_retries=5,
)
content = modal_normalization("string", statement)
res = factcheck.check_response(content)

return FactCheckResult.model_validate(res)


def rewrite_as_sentence(
question: str,
model: str = DEFAULT_OPENAI_MODEL,
openai_api_key: str | None = None,
) -> str:
"""
Rewrites the question into a sentence, example:

`Will former Trump Organization CFO Allen Weisselberg be sentenced to jail by 15 April 2024?`
->
`Former Trump Organization CFO Allen Weisselberg was sentenced to jail by 15 April 2024.`
"""
llm = ChatOpenAI(
model=model,
temperature=0.0,
api_key=openai_api_key,
)

prompt = f"""
Rewrite the question into a simple annoucment sentence stating a fact or prediction like it is already known.
Make future tense into past tense.
For future questions that ask if something will happen "by" some date, rewrite it to "before" that date or any time sooner.
For future questions that ask if something will happen "on" some date, rewrite it to "on" that date.
If the question is both "on" and "by" some date, rewrite it as "before or any time sooner than" that date.
If the question is about exact date, keep it exact.
If the question is about a date range, keep it a range.
Always keep the same meaning.
Never negate the sentence into opposite meaning of the question.

Question: {question}
Sentence:
"""
completion = str(llm.invoke(prompt, max_tokens=512).content)

return completion


# TODO: This could be imported from prediction-market-agent-tooling, but given the conflict in the langchain versions,
# it would require changes in other mechs of this repository.
def is_predictable_binary(
question: str,
model: str = DEFAULT_OPENAI_MODEL,
openai_api_key: str | None = None,
) -> str:
"""
Evaluate if the question is actually answerable.
"""
llm = ChatOpenAI(
model=model,
temperature=0.0,
api_key=openai_api_key,
)

prompt = f"""Main signs about a fully qualified question (sometimes referred to as a "market"):
- The market's question needs to be specific, without use of pronouns.
- The market's question needs to have a clear future event.
- The market's question needs to have a clear time frame.
- The event in the market's question doesn't have to be ultra-specific, it will be decided by a crowd later on.
- If the market's question contains date, but without an year, it's okay.
- If the market's question contains year, but without an exact date, it's okay.
- The market's question can not be about itself or refer to itself.
- The answer is probably Google-able, after the event happened.
- The potential asnwer can be only "Yes" or "No".

Follow a chain of thought to evaluate if the question is fully qualified:

First, write the parts of the following question:

"{question}"

Then, write down what is the future event of the question, what it refers to and when that event will happen if the question contains it.

Then, explain why do you think it is or isn't fully qualified.

Finally, write your final decision, write `decision: ` followed by either "yes it is fully qualified" or "no it isn't fully qualified" about the question. Don't write anything else after that. You must include "yes" or "no".
"""
completion = str(llm.invoke(prompt, max_tokens=512).content)

try:
decision = completion.lower().rsplit("decision", 1)[1]
except IndexError as e:
raise ValueError(
f"Invalid completion in is_predictable for `{question}`: {completion}"
) from e

if "yes" in decision:
is_predictable = True
elif "no" in decision:
is_predictable = False
else:
raise ValueError(
f"Invalid completion in is_predictable for `{question}`: {completion}"
)

return is_predictable


def build_run_result(
has_occurred: bool | None, is_determinable: bool | None
) -> Tuple[str, Optional[str], Optional[Dict[str, Any]], Any]:
return (
json.dumps(
{
"has_occurred": has_occurred,
"is_determinable": is_determinable,
}
),
"",
None,
None,
)


def most_common_fact_result(results: list[FactCheckResult]) -> FactCheckResult:
"""
Given a list of fact check results, return the first `FactCheckResult` in the list with `factuality` being the most common.
"""
factualities = [fact.factuality for fact in results]
most_common_fact = max(set(factualities), key=factualities.count)
first_most_common_fact = [
fact for fact in results if fact.factuality == most_common_fact
][0]
return first_most_common_fact


def run(
prompt: str,
api_keys: dict[str, str],
n_fact_runs: int = 3,
**kwargs: t.Any, # Just to ignore any other arguments passed to the resolver by the universal benchmark script.
) -> Tuple[str, Optional[str], Optional[Dict[str, Any]], Any]:
"""
Run the prediction market resolver based on Open Fact Verifier.
"""
assert (
n_fact_runs > 0 and n_fact_runs % 2 != 0
), "n_fact_runs must be greater than 0 and an odd number"
market_question = prompt # `prompt` argument name is for compatibility with the original resolver.
openai_api_key = api_keys["openai"]
serper_api_key = api_keys["serperapi"]

# Check if the question is reasonable to look for an answer.
is_answerable = is_predictable_binary(
market_question, openai_api_key=openai_api_key
)
if not is_answerable:
print(
f"Question `{market_question}` is not answerable, skipping fact checking."
)
return build_run_result(has_occurred=None, is_determinable=is_answerable)

# Rewrite the question (which was about a future) into a sentence (which is about the past).
market_sentence = rewrite_as_sentence(
market_question, openai_api_key=openai_api_key
)
print(f"Question `{market_question}` rewritten into `{market_sentence}`.")
# Fact-check the sentence.
factresults = [
factcheck(
market_sentence,
openai_api_key=openai_api_key,
serper_api_key=serper_api_key,
)
for _ in range(n_fact_runs)
]
factresult = most_common_fact_result(factresults)
print(
f"Fact check result for `{market_sentence}` is `{factresult.factuality}`, because {factresult.claims_details}."
)

return build_run_result(
has_occurred=factresult.factuality, is_determinable=is_answerable
)
Loading
Loading