- 
                Notifications
    You must be signed in to change notification settings 
- Fork 27
Fix parsing of integer literals with base prefix #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing of integer literals with base prefix #106
Conversation
9452423    to
    23f8ab4      
    Compare
  
    | After merging #107 the tests now pass. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.
I guess the simplest fix here would be to just replace int(x) with int(x, 0).  That should restore the existing behaviour.  But it looks like you want to improve things further, which is great!
| Hey @wnienhaus , tested with the ulp programs on my s2, and had no issues 👍 | 
| 
 Yes,  That said, since  def parse_int(literal):
    if len(literal) > 2:
        prefix_start = 1 if literal[0] == '-' else 0  # skip over negative sign if present
        if literal[prefix_start] == '0' and literal[prefix_start+1] in '123456789':
            return int(literal, 8)
    return int(literal, 0)and all tests still pass. So it's really just the octal case that's different (and theoretically we should disallow python style octal ( Now I am starting the overthink this: 
 I think I'll keep the current approach, as it's very explicit about what we support (including explicitly supporting python style octal). I'll just remove the comment about legacy octal format, because from the GNU as perspective, it's the currently valid and only possible octal format. (Happy to get feedback on my chosen approach) | 
| Ok. Fixes pushed. Will squash-merge this once approved. | 
f7dfddc    to
    5c84d08      
    Compare
  
    | 
 IMO that's quite a bit simpler and easier to read/understand. I would vote for this approach. Or a little simpler still: def parse_int(literal):
    if len(literal) >= 2 and literal.startswith(("0", "-0")) and literal.lstrip("-0").isdigt():
        return int(literal, 8)
    return int(literal, 0)Note: I think you need the  | 
| Thanks for that feedback. Shorter is nice, so let's go with that. MicroPython's  The  I added some extra tests to show more cases that (should) work as expected, e.g.  Anyway, your proposed code handles all cases correctly, so I used it now. And now there are tests to verify they all work as intended. Thanks. | 
| 
 Ah, it will in the next release! But in the interest of backwards compatibility, best not to rely on that yet. 
 Very good! | 
MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of the int() function with the behaviour of CPython (assume a decimal number, unless a base is specified. Only if a base of 0 is specified will the base be inferred from the string). This commit implements a new custom parsing function `parse_int`. It can correctly parse the following string literals: * 0x[0-9]+ -> treated as hex * 0b[0-9]+ -> treated as binary * 0o[0-9]+ -> treated as octal (Python style) * 0[0-9]+ -> treated as octal (GNU as style) * anything else parsed as decimal It only handles the GNU as style octal case directly, letting the original `int()` function handle the other cases (using base 0). In fact, the GNU as octal case was not handled correctly previously, and this commit fixes that. Some new tests for previous functionality were added to show that both new and previous cases are being handled correctly. Note: GNU as does not actually accept the octal prefix 0o..., but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.
57d06d9    to
    da5d928      
    Compare
  
    | Force-pushed after squashing the commits. Will merge next. | 
MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of the
int()function closer to the behaviour of CPython (something along the lines of: strings are assumed to represent a decimal number, unless a base is specified. if a base of 0 is specified, is the base is inferred from the string)This broke our parsing logic, which relied on the previous behaviour of the
int()function to automatically determine the base of the string literal, based on a base prefix present in the string. Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.Additionally, we never actually parsed octal in the format
0100correctly - even before this PR; that number would have been interpreted as 100 rather than 64.So, to fix this, and to ensure our parsing matches the GNU assembler, this PR implements a custom
parse_int()function, using the base prefix in a string to determine the correct base to pass toint(). The following are supported:The
parse_intmethod also supports the negative prefix operator for all of the above cases.This change also ensures
.int,.long,.worddirectives correctly handle the above mentioned formats. This fixes the issue described in #104.Note: GNU as does not actually accept the octal prefix
0o..., but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.