-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor parse_script function for readability and maintainability #90
base: main
Are you sure you want to change the base?
Conversation
This commit refactors the parse_script function to enhance readability and maintainability. Changes include: - Improved documentation with detailed explanations of the function's behavior, input, and output. - Renamed variable 'n_delimiters' to 'delimiter_count' for clarity. - Used specific exception handling instead of catching generic 'Exception'. - Combined two try-except blocks into one for better code organization. - Added comments to clarify complex logic and edge cases. - Enhanced function signature by specifying types of exceptions raised. These changes aim to make the codebase more understandable and easier to maintain, without altering its functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactoring effort has improved the structure and readability of the code. However, there are specific areas like error handling after JSON decoding and message construction logic, where the functionality might be prone to errors given certain inputs. It would be beneficial to add more diverse test cases to cover such scenarios comprehensively.
Butler is in closed beta. Reply with feedback or to ask Butler to review other parts of the PR. Please give feedback with emoji reacts.
try: # Make sure it's valid python | ||
|
||
# Validate the script as valid Python code | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the script
variable may have been transformed into a JSON object at line 33, this call to ast.parse(script)
might raise an error since ast.parse
expects a string as an input, not a Python object like a dictionary or list. We should ensure that ast.parse
is only called if script
is a string. Consider adding a check to verify the type of script
before parsing it as Python code, and adjust the error handling accordingly.
src/rawdog/parsing.py
Outdated
return response, "" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line attempts to concatenate a message with the last segment after splitting, which might not necessarily be always the part of the original message, especially if there are an uneven number of ''. Ensure that relevant test cases are checking for varying counts of '
' to avoid potential errors in message reconstruction.
src/rawdog/parsing.py
Outdated
if script.startswith("python"): | ||
script = script[len("python"):] # Remove 'python' prefix | ||
|
||
# Attempt to parse script as JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transformation from the prefix 'python' might not handle additional whitespace or characters that could follow the prefix, such as 'python3'. Consider using a more robust method to strip out not just 'python' but also variations to cater for possible user inputs. We could use regex for this purpose to ensure a standard and safe removal of the prefix.
Refined the `parse_script` function by streamlining parsing and validation. Changes include: 1. Consolidated error handling in a single try-except block to improve error management. 2. Enhanced prefix removal using regex for better handling of variations and whitespace. 3. Optimized script handling by decoding and re-encoding JSON, ensuring input to syntax validation is consistently a string. 4. Simplified message and script extraction by improving segment splitting and concatenation. These updates improve the function's adaptability and reliability across different input formats.
In the latest commit, I've tried to implement all the suggested improvements. |
# Attempt to interpret script as JSON, revert if it fails | ||
try: | ||
parsed_script = json.loads(script) | ||
script = json.dumps(parsed_script) # Convert back to string to validate as Python code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the original code runs json.loads(script)
which should fail because script should be python. It doesn't even propagate the error if it occurs. Do you know @granawkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of json.loads(script)
in the script parsing logic is indeed a bit confusing initially, but it serves a specific purpose. The idea is to handle cases where the script portion might contain data in JSON format instead of Python code. This can happen if the user intends to send JSON data as part of the response for some reason. By attempting to parse it as JSON, the function can handle more types of input flexibly. If the parsing fails, it indicates the script is actually Python code or some other format that isn't JSON, and the code simply continues to process it as such. It's a case of trying to gracefully handle various input formats that might be encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at one point I was getting errors with the script and suspected it had returned json for one reason or another, so I added this just in case. If it raised an exception that just meant it wasn't json after all, and we'd be fine.
I think we ought to take this out - I expect I was mistake about it being a json error. Anyway adding json.dumps
here is a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, Gran. Since you suggest that the json.loads
and the subsequent json.dumps
might be superfluous and potentially error-prone for our typical use cases, I'll proceed with removing these parts. This simplification should avoid unnecessary confusion and make the function's intent clearer, focusing only on handling python code.
script = json.dumps(parsed_script) # Convert back to string to validate as Python code | |
# Validate as Python code | |
ast.parse(script) | |
return message, script |
By removing the JSON parsing, the function aligns more directly with its primary purpose of handling Python code. Thanks for your insights!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing. I'm not sure I agree it's an improvement. For instance the doc string is expanded but does it really make the purpose of the function any clearer than "Split the response into a message and a script."?
script = '```'.join(parts[1:-1]).strip() # Join any inner parts as the script | ||
|
||
# Remove 'python' or similar prefixes from the script | ||
script = re.sub(r"^\s*python[0-9]*\s*", "", script, flags=re.IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I should have said something sooner about how it's fine to ignore the bot. In my experience markdown is always prepended with just python
by the llm. I think the extra complexity to handle python3 is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your insights, Jake. Based on your experience that markdown is typically accompanied with just the 'python' prefix by the LLM, I agree that it simplifies the implementation. Nevertheless, given that the current implementation now already includes regex to handle potential variations like 'python3', would you prefer to keep this aspect for potential extensibility and robustness? This can ideally prepare the function to handle less typical cases without immediate rework in the future. Please let me know your preference.
return message, script | ||
return message, script | ||
except SyntaxError as e: | ||
return f"Error in Python syntax: {e}", "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather return the response than e. e will just be another variation of "synax error" but the prefix already says that and the user would presumably rather see the errored code to find the syntax error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid point concerning returning the original response with the syntax error message to assist users in locating and correcting the error directly in their code. Implementing this could enhance the error handling experience by providing more context about the issue. Would integrating a portion of the problematic script into the error message be an acceptable solution? For instance, including the first few lines of the syntax-error-containing script might provide immediate clarity without overwhelming the user with the entire script. Here's an example suggestion:
return f"Error in Python syntax: {e}", "" | |
return f"Error in Python syntax: {e}. Problematic part: {script[:50]}...", "" |
This modification would give users a quick look at where the error might be without needing to recheck their whole script, assuming it's not shown elsewhere in the interface.
except SyntaxError as e: | ||
return f"Error in Python syntax: {e}", "" | ||
except Exception as e: | ||
return f"Unhandled error: {e}", "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other exceptions could we encounter? I can't think of any and if we do I'd rather propagate them and fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid point, Jake. Propagating other exceptions could indeed be beneficial for debugging purposes, especially in a production environment where error tracking and handling specifics help in quickly identifying issues. The current catch-all exception handling might obscure unexpected scenarios. Would it be better to specify the types of exceptions we expect or remove the generic exception block entirely to allow propagation?
return response, "" # Not enough parts, return original message and empty script | ||
|
||
# Clean and identify parts | ||
message = parts[0] + parts[-1] # Consider the first and last parts as the message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred joining with a newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, Jake. Given that the primary way markdown is added by the LLM is with just python
, the additional complexity to handle variations may indeed be unnecessary for most scenarios. I'll make the adjustment to accommodate the more common case and simplify the code as suggested.
|
||
|
||
def parse_script(response: str) -> tuple[str, str]: | ||
"""Split the response into a message and a script. | ||
"""Split the response into a message and a script, handling variations of 'python' prefix and JSON content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the added clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood regarding simplifying the handling of prefix variations. I've noted your experience in this context, which points to a python
only prefix by the LLM typically. Yet, considering the possibility of accommodating different user inputs, this refactor aims to provide broader support and resilience against various user input formats. Balanced against the potential overhead, would you agree that maintaining the enhanced input handling via regex provides a more safeguarded approach, or would you still prefer to remove that complexity? Your insights will aid in resolving this conclusively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly I don't think any of this is an improvement, except for maybe reformatting the docstring.
# Attempt to interpret script as JSON, revert if it fails | ||
try: | ||
parsed_script = json.loads(script) | ||
script = json.dumps(parsed_script) # Convert back to string to validate as Python code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at one point I was getting errors with the script and suspected it had returned json for one reason or another, so I added this just in case. If it raised an exception that just meant it wasn't json after all, and we'd be fine.
I think we ought to take this out - I expect I was mistake about it being a json error. Anyway adding json.dumps
here is a mistake.
This commit refactors the parse_script function to enhance readability and maintainability. Changes include:
These changes aim to make the codebase more understandable and easier to maintain, without altering its functionality.