Heuristic proxy for confidence in agent's predictions #477

gabrielfior · 2024-09-20T01:17:39Z

Based on @kongzii suggestion:

-> Divide all agent's predictions into probabilty buckets (deciles), e.g. if an agent gives 65% probability to a market, it goes in the 7th decile.
-> For each decile, we roughly expect the accuracy of it to be equal their decile - i.e., the 7th decile above (60-70%) should have an accuracy of roughly 60-70%.
-> Using the correlation between decile accuracy vs actual accuracy, we can draw a value for the confidence
-> It would also be interesting to use the metrics above to quantify an associated error.

kongzii · 2024-09-20T06:14:24Z

-> we can draw a value for the confidence

How do you mean?

evangriffiths · 2024-09-20T09:37:08Z

I understood this analysis to be one way of understanding 'how accurate are the p_yes predictions of an agent?', not that it should be used to generate a confidence score for a given prediction. Maybe this info could be given to the agent when asking it to generate a confidence score, but I think it still needs to be decided on a per-prediction basis.

gabrielfior · 2024-09-20T14:00:31Z

I understood this analysis to be one way of understanding 'how accurate are the p_yes predictions of an agent?'

That's also my understanding

The question (for this ticket) remains open - how should we define confidence for the agent? Still ask the agent for it, or define using hardcoded rules?

gabrielfior · 2024-09-20T14:53:28Z

Some additional observations
-> From @evangriffiths : "I remember back in the beginning we used the PMAT Benchmark class to generate a bunch of predictions and confidence scores, and we saw that the LLM gave pretty rubbish scores - there was like no correlation between 'confidence' and 'abs difference between estimate_p_yes and manifold/polymarket p_yes'. So we can definitely do better, but it's not obvious how. "
-> From @kongzii : "Another LLM doing the confidence based on research and probability from the first LLM ?"
-> From @gabrielfior - Let's mark this as low priority since we don't have a great idea on how to improve the current status quo.

kongzii · 2024-10-01T16:13:45Z

To sum it up, there are multiple parts to this issue:

Evaluation of p_yes of agents
Evaluation of confidence of agents
Getting a better way of drawing confidence

(1) and (2) feels to be easily doable thanks to https://github.com/gnosis/prediction-market-agent-tooling/blob/main/examples/monitor/match_bets_with_langfuse_traces.py, and I'd say that's more than a low priority now given the mixed results of Kelly, wdyt @evangriffiths @gabrielfior ?

evangriffiths · 2024-10-01T16:50:31Z

@kongzii are you thinking this is another approach for how we can still use KellyBettingStrategy(max_bet_amount=big_number), but mitigate the issue where the agent is incorrectly very confident, and loses all its money? And I guess there's no reason why this couldn't be used in combination with @gabrielfior's max_slippage approach.

My one reservation is that it might be a bit messy in the code - to throw away the confidence returned by the agent, and use this new one, based on this approach. But definitely worth a try

kongzii · 2024-10-01T16:58:55Z

No, no, I just meant it as yet another evaluation method. Similarly, as we have accuracy and profitability, we can also have something like:

mean_abs_error = sum(abs(bucket_predicted_probability - bucket_real_probability) for bucket in buckets) / number_of_buckets

agent with the lowest MAE should be the best probability predictor.

gabrielfior · 2024-10-01T17:08:25Z

No, no, I just meant it as yet another evaluation method

Agree with this as scope of the ticket.

gabrielfior added the low priority label Sep 20, 2024

gabrielfior mentioned this issue Oct 1, 2024

Evaluate probability estimation of agents gnosis/prediction-market-agent-tooling#454

Closed

evangriffiths removed the low priority label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heuristic proxy for confidence in agent's predictions #477

Heuristic proxy for confidence in agent's predictions #477

gabrielfior commented Sep 20, 2024

kongzii commented Sep 20, 2024

evangriffiths commented Sep 20, 2024

gabrielfior commented Sep 20, 2024

gabrielfior commented Sep 20, 2024

kongzii commented Oct 1, 2024

evangriffiths commented Oct 1, 2024

kongzii commented Oct 1, 2024

gabrielfior commented Oct 1, 2024

Heuristic proxy for confidence in agent's predictions #477

Heuristic proxy for confidence in agent's predictions #477

Comments

gabrielfior commented Sep 20, 2024

kongzii commented Sep 20, 2024

evangriffiths commented Sep 20, 2024

gabrielfior commented Sep 20, 2024

gabrielfior commented Sep 20, 2024

kongzii commented Oct 1, 2024

evangriffiths commented Oct 1, 2024

kongzii commented Oct 1, 2024

gabrielfior commented Oct 1, 2024