Recommendations researcher #839

kfsone · 2023-12-01T22:51:50Z

kfsone
Dec 1, 2023

Your dad calls: mom's old sweater-maker-98 broke down and he's thinking about spending $23,000,000 on a second hand "print shirt mk 3" so she can still make you a sweater this year.

I'm trying to put together a research agent group that is capable of building a customized recommendation list along with some purchase guidance running against local LMs. I've tried a bunch of simple and complex approaches, but none of the approaches seem to work. The moment I stray from the 'arxiv research' task in the original group-web-research group, nothing works.

I either end up with it just going ahead and dumping a list of recommendations on the first pass, and they're out of date with no obvious web access, or it getting stuck on 'GroupChat select_speaker failed to resolve the next speaker's name. This is because the speaker selection OAI call returned:'

https://gist.github.com/kfsone/4db4156fc77eb7a9ce6c09ae59278764

This one has gotten a little fancy ... I tried making it less likely I would make typos telling one agent to talk to another...

Any advice/obvious gotchas? And recommendations for reasonable models that might help? I've tried dolphin 2.2.1, Mistral 7b Instruct 0.1 k6m, sciphi-self-rag-mistral-32k ... I have token window set to 16k to try and ensure context windows aren't an issue...

sonichi · 2023-12-03T14:36:46Z

sonichi
Dec 3, 2023

This project might help: https://www.crafters.ai/aitools/research-agents-3-0

0 replies

kfsone · 2023-12-03T21:02:42Z

kfsone
Dec 3, 2023
Author

@sonichi Things I've learned since posting:

My 'assistant' method wasn't setting llm_config=...,
There is no auto-routing in the chat manager, my mis-expectation likely a consequence of having experimented with ChatDev prior to autogen?

Posit:
The notebook format as a means of bulk documentation/demonstration makes for actually very poor documentation. Mundane variances between notebooks (llm_config=gtp4_config, ...) make it easy to derive highly inaccurate theory-of-work about the calls being made; the large quantity of setup code shown in the notebooks creates a cognitive burden greater than the total of actual autogen code in any 2-3 notebooks combined just for setup, in particular a large delta between any two notebooks?

I'm also finding the specificity of some of the examples very significant. The text in many of the prompts feels like it is actually thought by the human authors to be "telling" the model what to do. The purpose of prompts is to seed the token stream. If the LM was trained on documents in which the ~pattern "You are a helpful AI. The AI provides helpful answers. The AI thinks outside the box" occurred, then you are going to strongly over-emphasize a set of - presumably - speculative discussions about how an AI might behave, and if the training data came from reddit or stackoverflow that's probably going to weight-up patterns that were discussing The 3 Laws or morality or such which will produce banter (lots of civility, thank you, I'm pleased to help, etc).

The upshot of this is that it's really super hard to get some of the notebooks to do something very different than the specific scenario in the example - the arxiv web analysis example works less efficiently for other sites or other forms of processing of similar types of document, and begins to rapidly deteriorate in success rates as you move to more unrelated topics. Not unexpectedly, obviously, but it can drastically complicated learning the toolset. C.f ask it to search a short (2-4) list of rss or article feeds on news about ssds, gpus, or 3d printers in the last year to identify significant developments that have come to market to select factors/features worth looking for in choosing devices released in the last 6 months.

I would expect to see it struggle, perhaps, with extracting the data, or identifying "developments" - and depending on the LM used this does begin to surface, but mostly what happens - even with GPT 4 16k - is that it just starts to go a bit loopy or start just doing other things. A too-short prompt today regarding new TPUs resulted it in wanting to provide me with some surely deep insights into airline tickets.

If you eliminate the web search component, then results are obviously a factor of the model/settings/quality vs the alignment of key phrase or word patterns in the prompts with attention. I don't think most users realize that the difference between various prompt settings is sometimes just down to word positioning. "Do not create stub functions. Do not create empty methods." may result in attention on "do not create stub functions" and "create empty methods" whereas "Do not please create stub functions. Chicken* Do not create empty methods" may lead it to correctly attend to "[Do] not create empty methods" ('<*>' being two arbitrary tokens) But because users instead make a change to, say, "Do not create stub functions and please do not create empty methods" it reinforces their perception that you are explaining or instructing the model rather than attention/pattern curating.

This is an incredibly easy mental trap to fall into while also trying to understand the effect of tweaking code.

Suggestion:
Standardize the setup portions of .py files in the notebooks directory so that each notebook can be more focused.

0 replies

rickyloynd-microsoft · 2023-12-06T00:16:31Z

rickyloynd-microsoft
Dec 6, 2023
Collaborator

#852 has been opened to address this.

0 replies

rickyloynd-microsoft · 2023-12-06T00:16:47Z

rickyloynd-microsoft
Dec 6, 2023
Collaborator

I'll resolve this for now. Please reopen to continue the discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommendations researcher #839

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Recommendations researcher #839

Uh oh!

kfsone Dec 1, 2023

Replies: 4 comments

Uh oh!

sonichi Dec 3, 2023

Uh oh!

Uh oh!

kfsone Dec 3, 2023 Author

Uh oh!

rickyloynd-microsoft Dec 6, 2023 Collaborator

Uh oh!

rickyloynd-microsoft Dec 6, 2023 Collaborator

kfsone
Dec 1, 2023

sonichi
Dec 3, 2023

kfsone
Dec 3, 2023
Author

rickyloynd-microsoft
Dec 6, 2023
Collaborator

rickyloynd-microsoft
Dec 6, 2023
Collaborator