Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: allow digit separator after 0b or 0x #12680

Closed
jskeet opened this issue Jul 22, 2016 · 8 comments
Closed

Feature request: allow digit separator after 0b or 0x #12680

jskeet opened this issue Jul 22, 2016 · 8 comments
Assignees
Labels
Area-Compilers Area-Language Design Feature Request Resolution-External The behavior lies outside the functionality covered by this repository
Milestone

Comments

@jskeet
Copy link

jskeet commented Jul 22, 2016

This was previously requested in a comment on #216 (and I independently viewed that thread precisely to see if it was already valid).

With VS15 Preview 3, we have:

Valid:

var x = 0b1010_0000;
var y = 0x1234_abcd;

Not valid:

var x = 0b_1010_0000;
var y = 0x_1234_abcd;

I find the latter more readable than the former. While I can see the reason why digit separators before just digits isn't valid (e.g. _1, which is a valid identifier), the leading 0x or 0b already prevents the token from being an identifier.

// cc @zippec

@gafter
Copy link
Member

gafter commented Aug 1, 2016

This is a language request, not a compiler request. The compiler is behaving per its specification.

@gafter gafter added this to the 2.1 milestone Aug 1, 2016
@gafter gafter self-assigned this Aug 1, 2016
@gafter
Copy link
Member

gafter commented Aug 1, 2016

/cc @khyperia FYI

@khyperia
Copy link
Contributor

khyperia commented Aug 3, 2016

I implemented this a while back, so I figured I'd chime in with what I know. While I'm not sure where the exact spec is hiding (it likely is equivalent to the compiler), this is what the compiler does:

Any "string of digits" (e.g. 0-9 for decimals, plus fullwidth for VB) in any literal (decimal, hex, binary, float, double) can contain any number of underscores at any place between the first and last digit (i.e. cannot start nor end with an underscore).

There are additional cases that might be interesting to consider when discussing the choice of if 0x_2 should be allowed, mostly relating to floats ("reasonable" means "easy to design without breaking changes"). I've also listed cases that cannot or are difficult to be parsed without a breaking change. (All of these are impossible with today's rules)

  • 0x_2 -- the original proposal
  • 0b_10 -- same
  • 0x2_ -- reasonable
  • _1.2e3 -- might be technically possible, but involves lookahead to see a digit or the e (and breaks in unintuitive ways)
  • 1_.2e3 -- reasonable
  • 1._2e3 -- same as earlier, but even more unintuitive (e.g. 1._2 is impossible to be resolved in the parser, it needs the exponent syntax to be possible)
  • 1.2_e3 -- reasonable
  • 1.2e_3 -- reasonable (this is an odd one - prefixing an underscore to the digit sequence isn't simple to do in the other two cases)
  • 1.2e3_ -- reasonable

Additionally, 0_x2 might be considered, but I don't see how that makes sense at all.

Note that if any rules is changed, we would also want to update VB, as well as possibly F# - I helped out a PR implementing digit separators in F#, and they ended up following the same rules.


Edit from half a year later (2017-02-11): Don't know what I was thinking with _1.2e3 or 1._2e3 being technically possible to be resolved in the parser, they're definitely not. My personal opinion is that 0x_2 is the only truly useful change, but I figured I'd correct the above for potential future discussion.

@orthoxerox
Copy link
Contributor

1._2e3 -- same as earlier, but even more unintuitive (e.g. 1._2 is impossible to be resolved in the parser, it needs the exponent syntax to be possible)

So the compiler will have to wait until it knows whether _2e3 is a valid extension method on int or not to choose between 1 and 1.2e3? I don't know if that's worth it.

@AdamSpeight2008
Copy link
Contributor

I think the proposed grammar was
Literal ::= Prefix ( Sep? Digit )* Digit

@gafter
Copy link
Member

gafter commented Feb 11, 2017

I'm closing this and letting the LDM decide before doing any work. See dotnet/csharplang#65

@gafter gafter closed this as completed Feb 11, 2017
@gafter gafter added the Resolution-External The behavior lies outside the functionality covered by this repository label Feb 11, 2017
@CyrusNajmabadi
Copy link
Member

Note:

I'd be very wary about allowing a prefix of _ as _1 is already a legal identifier.

@AdamSpeight2008
Copy link
Contributor

@CyrusNajmabadi The meaning of prefix here are is, are those define by the language specification.

Prefix ::= HexPrefix | BinPrefix | ...
HexPrefix ::= "0h" | "0H"
BinPrefix ::= "0b" | "0B"

If the prefix is missing then it possible for the character sequence to match a identifier (possibly legal) if the underscore separator is first. Though this is unlikely as the prefix is required in this context of digit separators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Area-Language Design Feature Request Resolution-External The behavior lies outside the functionality covered by this repository
Projects
None yet
Development

No branches or pull requests

7 participants