-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Local LLM Message Cleanup : Code Execution in code_utils.py #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local LLM Message Cleanup : Code Execution in code_utils.py #399
Conversation
I've noticed that there have been reports of Local LLM's models not properly executing generated code despite appearing correct. It turns out that some models are outputting "\r\n" instead of just "\n" Some people use \r\n instead of just \n because they may be working in a context where CRLF line endings are expected or required, so it makes sense that subsequent models are trained the same way. This is a simple fix that cleans up message text to remove /r so the autogen detector/compiler can detect and run the received code.
victordibia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Makes sense.
Given that the \r\n itself does not modify the behavior of the code (atleast I dont see immediate examples of this apart from perhaps string matching logic of \r\n which should be rare...), I can see how this improves reliability for some local models.
Minor comment
Can you share some pointers to any models that exhitit the \r\n behavior to enable reproducibility?
Codecov Report
@@ Coverage Diff @@
## main #399 +/- ##
===========================================
- Coverage 41.29% 15.18% -26.12%
===========================================
Files 20 19 -1
Lines 2448 2450 +2
Branches 548 552 +4
===========================================
- Hits 1011 372 -639
- Misses 1359 2077 +718
+ Partials 78 1 -77
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Thanks Victor, one of the models I tested with was This one. The code would output perfectly in the console but wouldn't compile and came back as "unknown". I figured it's a common enough possibility that it may encompass other rogue local LLM's out there. |
No major change here. Only formatting updates based on the autogen precommit hook. pre-commit run --all-files
@victordibia do you think we should be modifying the core logic of LMK, if I misunderstood the issue? |
|
@gagb You raise a good point here. Current pr/scenario
Can you provide a similar walkthrough of how Overall, we might need some benchmark that indicates how often the |
|
You know @victordibia and @gagb I may have stumbled across a solution by accident when I was exploring regex options. Originally I thought that the main problem was how the pattern selection detected the beginning of the code, where it decided what compiler to use. This ties into another similar problem I've been facing with Local LLM's which I'll get to in a second. Related to this PR directly and the above code snippet: At the time I was thinking of a full sterilization for consistency but now that I think about it, the real problem may actually be this section of regex that detects what compiler to use. We could just sterilize everything outside of the code blocks and leave the rest untouched? Perhaps we scan and find where the compiler is mentioned as the key for the start of the code block? Example: Look for instead of just the code block fence? Sort of related to this PR and ties into my proposed solution Another problem I've been running into which in my opinion is in the same vein here is when a local model fails a snippet of code and rewrites it, or rewrites really anything. It tends to output the start of a code block with nothing throwing off the fixed code block. I'm thinking we can use the same solution/detection for sanitizing outside the code blocks to detect and sanitize this problem too. Example: As you can imagine, what happens is the block : is detected and fails to execute when the is never referenced because it's actually detected as This has happened with a few models, the one I mentioned earlier along with mistral's dolphin 2.1 quantified by bloke is the latest one I tried, and minstral instruct did it too. To reproduce, let a model run for awhile and you'll quickly see this happen when some code didn't generate correctly tl;dr:I think we can sanitize everything outside of the coding blocks and things will still run. We should look at scanning and using the compiler detection logic to detect chunks of code blocks and sterilize everything we need to sterilize outside of them, including rogue fences. |
autogen/code_utils.py
Outdated
| """ | ||
|
|
||
| # Some Local LLM models/servers output \r\n insteaf or just \n. Let's clean it up before continuing | ||
| text = re.sub(r"\r\n", "\n", text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a parameter or environment variable to control enable or disable this replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was testing this morning using detect_single_line_code=True and it worked fine with and without \r\n since it's not looking for \n's.
Perhaps we should have it set to default since it seemingly works much better overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @robzsaunders . My concern is that if in some case there is a line of code like print("something\r\n"), the replacement will modify the code itself while maybe the user wants to keep it.
|
Here is another similar situation where code extraction fails: |
@victordibia, Yes that custom_reply will look similar to the preconfigured version but modified with whatever new extraction/code detection logic we need. See https://github.com/microsoft/autogen/blob/main/autogen/agentchat/conversable_agent.py#L127 @afourney's test suite will come in handy for the testing! |
|
Question: I just tried Mistral and Dolphin2.1 from TheBloke, added """ text = re.sub(r"\r\n", "\n", text) """ to code_utils.py as well as set detect_single_line_code = True. I AM using LMStudio to host the model instead of using the Transformers library, but I believe that only affects the caching. Could this be the reason for my error? |
|
I think your issue is not related to this. Notice how your python script doesn't close with ```. You should check your max tokens to ensure that you're not being capped causing premature responses that aren't complete. You have incomplete code there, and without a closing code block fence, The code extractor doesn't match. |
|
@robzsaunders you are correct, I see that now. Unfortunately, I confirmed token count settings and didn't resolve. I'll have to keep digging. |
Related to my comment here: microsoft#399 (comment) This is further sterilization of the incoming code blocks, filtering out bad code block fences and possible bad code block language flags. I also flipped the detect_single_line_code to True since it just works better, and moved the /r/n/ sterilization to inside the detect_single_line_code=False mode since that's the only place it matters.
Small addition to support new sterilization in code_utils.py
Small update to support changes made in code_utils.py
@microsoft-github-policy-service agree |
Fixed a small overlooked variable bug
|
@robzsaunders thanks for the PR, please take a look at the current conflict with the main branch. |
Why are these changes needed?
I've noticed that there have been reports of Local LLM's models not properly executing generated code despite appearing correct. It turns out that some models are outputting "\r\n" instead of just "\n"
Some people use \r\n instead of just \n because they may be working in a context where CRLF line endings are expected or required, so it makes sense that subsequent models are trained the same way.
This is a simple fix that cleans up message text to remove /r so the autogen detector/compiler can detect and run the received code.
Example "CRLF" formatted output:
\r\n```python\r\nprint(\"Hello, World!\")\r\n```Example cleaned output:
\n```python\nprint(\"Hello, World!\")\n```Related issue number
I think this is related to #279 where no code would execute and subsequently no response would be fed back into the Local LLM causing a empty string response. Just part of the picture.
Checks