Skip to content

Conversation

@robzsaunders
Copy link

Why are these changes needed?

I've noticed that there have been reports of Local LLM's models not properly executing generated code despite appearing correct. It turns out that some models are outputting "\r\n" instead of just "\n"

Some people use \r\n instead of just \n because they may be working in a context where CRLF line endings are expected or required, so it makes sense that subsequent models are trained the same way.

This is a simple fix that cleans up message text to remove /r so the autogen detector/compiler can detect and run the received code.

Example "CRLF" formatted output:
\r\n```python\r\nprint(\"Hello, World!\")\r\n```

Example cleaned output:
\n```python\nprint(\"Hello, World!\")\n```

Related issue number

I think this is related to #279 where no code would execute and subsequently no response would be fed back into the Local LLM causing a empty string response. Just part of the picture.

Checks

I've noticed that there have been reports of Local LLM's models not properly executing generated code despite appearing correct. It turns out that some models are outputting "\r\n" instead of just "\n"

Some people use \r\n instead of just \n because they may be working in a context where CRLF line endings are expected or required, so it makes sense that subsequent models are trained the same way.

This is a simple fix that cleans up message text to remove /r so the autogen detector/compiler can detect and run the received code.
Copy link
Collaborator

@victordibia victordibia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Makes sense.
Given that the \r\n itself does not modify the behavior of the code (atleast I dont see immediate examples of this apart from perhaps string matching logic of \r\n which should be rare...), I can see how this improves reliability for some local models.

Minor comment
Can you share some pointers to any models that exhitit the \r\n behavior to enable reproducibility?

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2023

Codecov Report

Merging #399 (5a634b0) into main (50ac547) will decrease coverage by 26.12%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##             main     #399       +/-   ##
===========================================
- Coverage   41.29%   15.18%   -26.12%     
===========================================
  Files          20       19        -1     
  Lines        2448     2450        +2     
  Branches      548      552        +4     
===========================================
- Hits         1011      372      -639     
- Misses       1359     2077      +718     
+ Partials       78        1       -77     
Flag Coverage Δ
unittests 15.18% <0.00%> (-26.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
autogen/agentchat/contrib/math_user_proxy_agent.py 15.87% <0.00%> (-28.58%) ⬇️
autogen/agentchat/conversable_agent.py 16.48% <0.00%> (-57.03%) ⬇️
autogen/code_utils.py 19.16% <0.00%> (-29.32%) ⬇️

... and 8 files with indirect coverage changes

@robzsaunders
Copy link
Author

Thanks for this. Makes sense. Given that the \r\n itself does not modify the behavior of the code (atleast I dont see immediate examples of this apart from perhaps string matching logic of \r\n which should be rare...), I can see how this improves reliability for some local models.

Minor comment Can you share some pointers to any models that exhitit the \r\n behavior to enable reproducibility?

Thanks Victor, one of the models I tested with was This one. The code would output perfectly in the console but wouldn't compile and came back as "unknown".

I figured it's a common enough possibility that it may encompass other rogue local LLM's out there.

No major change here. Only formatting updates based on the autogen precommit hook.

pre-commit run --all-files
@gagb
Copy link
Collaborator

gagb commented Oct 24, 2023

Thanks for this. Makes sense. Given that the \r\n itself does not modify the behavior of the code (atleast I dont see immediate examples of this apart from perhaps string matching logic of \r\n which should be rare...), I can see how this improves reliability for some local models.

Minor comment Can you share some pointers to any models that exhitit the \r\n behavior to enable reproducibility?

@victordibia do you think we should be modifying the core logic of code_utils for this or this should addressed by the end user via a custom reply function? Are there case when the code block suggests \r\n and this suggestion would overwrite them. For example, a degenerate case is I purposefully asked the agents to suggest to write via \r\n.

with open("my_file.txt", "w", newline="\r\n") as f:
    f.write("Line 1\r\n")
    f.write("Line 2\r\n")
    f.write("Line 3\r\n")

LMK, if I misunderstood the issue?

@victordibia
Copy link
Collaborator

@gagb You raise a good point here.
There are tradeoffs to implementing this for all in code_utils.py () or having each user implement their own version.

Current pr/scenario

  • local model is used by agent to generate code
  • code contains \r\n, so extract_code fails to return anything
  • agent response is empty or leads to an error
  • current pr does some sanitization in extract_code where occurrences of \r\n are replaced with \n enabling correct code extraction.

Can you provide a similar walkthrough of how custom_reply would be used to address the situation? Also, what errors will your degenerate use case lead to. Do you mean that in the scenario where the code is meant to write to a file with \r\n formating, this formatting will be changed?

Overall, we might need some benchmark that indicates how often the \r\n issue occurs and any associated side effects.

@robzsaunders
Copy link
Author

You know @victordibia and @gagb I may have stumbled across a solution by accident when I was exploring regex options.

Originally I thought that the main problem was how the pattern selection detected the beginning of the code, where it decided what compiler to use. This ties into another similar problem I've been facing with Local LLM's which I'll get to in a second.

Line 21: CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```" (pattern is set to this later)
Line 68:    if not detect_single_line_code:
Line 69:      match = re.findall(pattern, text, flags=re.DOTALL)
Line 70:      return match if match else [(UNKNOWN, text)]

Related to this PR directly and the above code snippet:
I recall coming across a scenario where only the first part of the code snippet was sterilized. For example, generated code that was say 50 lines, but because I was using /b/r/n/b for my regex, a segment of code that was /r/n/r/n/r/n wouldn't sterilize and then everything beyond it stayed un-sterilized. However, despite this, the code still executed.

At the time I was thinking of a full sterilization for consistency but now that I think about it, the real problem may actually be this section of regex that detects what compiler to use. We could just sterilize everything outside of the code blocks and leave the rest untouched?

Perhaps we scan and find where the compiler is mentioned as the key for the start of the code block? Example: Look for

```python

instead of just the code block fence?

Sort of related to this PR and ties into my proposed solution

Another problem I've been running into which in my opinion is in the same vein here is when a local model fails a snippet of code and rewrites it, or rewrites really anything. It tends to output the start of a code block with nothing throwing off the fixed code block.

I'm thinking we can use the same solution/detection for sanitizing outside the code blocks to detect and sanitize this problem too.

Example:

```\n\n```python\n (Bunch of python code) \n```

As you can imagine, what happens is the block :

```\n\n```

is detected and fails to execute when the

```python\n (Bunch of python code) \n``` 

is never referenced because it's actually detected as

python\n (Bunch of python code) \n```

This has happened with a few models, the one I mentioned earlier along with mistral's dolphin 2.1 quantified by bloke is the latest one I tried, and minstral instruct did it too. To reproduce, let a model run for awhile and you'll quickly see this happen when some code didn't generate correctly

tl;dr:

I think we can sanitize everything outside of the coding blocks and things will still run. We should look at scanning and using the compiler detection logic

eg; ```python

to detect chunks of code blocks and sterilize everything we need to sterilize outside of them, including rogue fences.

"""

# Some Local LLM models/servers output \r\n insteaf or just \n. Let's clean it up before continuing
text = re.sub(r"\r\n", "\n", text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a parameter or environment variable to control enable or disable this replacement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was testing this morning using detect_single_line_code=True and it worked fine with and without \r\n since it's not looking for \n's.

Perhaps we should have it set to default since it seemingly works much better overall.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @robzsaunders . My concern is that if in some case there is a line of code like print("something\r\n"), the replacement will modify the code itself while maybe the user wants to keep it.

@afourney
Copy link
Member

Here is another similar situation where code extraction fails:

#414 (comment)

@gagb
Copy link
Collaborator

gagb commented Oct 25, 2023

@gagb You raise a good point here. There are tradeoffs to implementing this for all in code_utils.py () or having each user implement their own version.

Current pr/scenario

  • local model is used by agent to generate code
  • code contains \r\n, so extract_code fails to return anything
  • agent response is empty or leads to an error
  • current pr does some sanitization in extract_code where occurrences of \r\n are replaced with \n enabling correct code extraction.

Can you provide a similar walkthrough of how custom_reply would be used to address the situation? Also, what errors will your degenerate use case lead to. Do you mean that in the scenario where the code is meant to write to a file with \r\n formating, this formatting will be changed?

Overall, we might need some benchmark that indicates how often the \r\n issue occurs and any associated side effects.

@victordibia, Yes that custom_reply will look similar to the preconfigured version but modified with whatever new extraction/code detection logic we need. See https://github.com/microsoft/autogen/blob/main/autogen/agentchat/conversable_agent.py#L127

@afourney's test suite will come in handy for the testing!

@JHamburg430
Copy link

Question: I just tried Mistral and Dolphin2.1 from TheBloke, added """ text = re.sub(r"\r\n", "\n", text) """ to code_utils.py as well as set detect_single_line_code = True.

I AM using LMStudio to host the model instead of using the Transformers library, but I believe that only affects the caching. Could this be the reason for my error?
Just to note: I've test both of these models with the math examples from the notebook without issue.
.
.
.
"role": "system"
},
{
"content": "What date is today? Compare the year-to-date gain for META and TESLA.",
"role": "user"
},
{
"content": "\n```python\n# filename: today_gain_comparison.py\nimport datetime\nimport requests\nfrom bs4 import BeautifulSoup\n\ntoday = datetime.date.today()\nprint(f"Today's date is {today}")\n\nmeta_url = "https://finance.yahoo.com/quote/FB?p=FB&.tsrc=fin-srch\"\ntesla_url = "https://finance.yahoo.com/quote/TSLA?p=TSLA&.tsrc=fin-srch\"\n\ndef get_year_to_date_gain(ticker):\n response = requests.get(ticker)\n soup = BeautifulSoup(response.content, 'html.parser')\n \n gain_element = soup.find('span', class_='Trsdu(0.3s",
"role": "assistant"
},
{
"content": "",
"role": "user"
}
],
"model": "Mistral",
"temperature": 0,
"detect_single_line_code": true
}
[2023-10-25 18:36:00.629] [ERROR] Error: 'messages' array must only contain objects with a 'content' field that is not empty.

@robzsaunders
Copy link
Author

robzsaunders commented Oct 25, 2023

@jhamburger

I think your issue is not related to this.

Notice how your python script doesn't close with ```. You should check your max tokens to ensure that you're not being capped causing premature responses that aren't complete. You have incomplete code there, and without a closing code block fence, The code extractor doesn't match.

@JHamburg430
Copy link

@robzsaunders you are correct, I see that now. Unfortunately, I confirmed token count settings and didn't resolve. I'll have to keep digging.

Related to my comment here: microsoft#399 (comment)

This is further sterilization of the incoming code blocks, filtering out bad code block fences and possible bad code block language flags.

I also flipped the detect_single_line_code to True since it just works better, and moved the /r/n/ sterilization to inside the detect_single_line_code=False mode since that's the only place it matters.
Small addition to support new sterilization in code_utils.py
Small update to support changes made in code_utils.py
@robzsaunders
Copy link
Author

@robzsaunders please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Fixed a small overlooked variable bug
@ekzhu
Copy link
Contributor

ekzhu commented Dec 25, 2023

@robzsaunders thanks for the PR, please take a look at the current conflict with the main branch.

@gagb gagb closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants