Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow \x00 (null bytes) and empty string in @"" identifier syntax in the language specification #14534

Open
Tracked by #14523 ...
andrewrk opened this issue Feb 3, 2023 · 2 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Feb 3, 2023

Extracted from #14523.

This is a proposal for a partial reversal of #8262. The reasoning behind that proposal was:

  • exported symbols with null bytes may not be representable in the target object format (for example Any export called @"" is not exported #12230)
  • avoid bugs with third party tools which store zig identifiers as null-terminated strings
  • allows for a more efficient representation of identifier names as strings within zig compilers

The problem we face today is the use case of serializing a key-value map into an anonymous struct literal. If any map keys contain null bytes, or has length zero, a syntax error occurs, making many kinds of maps not representable. This is especially a problem with the introduction of the ZON format. Even JSON has the ability to represent null bytes in map keys. It would be a crime to introduce a new data exchange format and have it be worse than JSON at representing a mere map data structure.

I will address all three points above now.

exported symbols with null bytes may not be representable in the target object format

We already need an additional layer of validation for target object formats, as noted in the original issue. This validation should be per-target. For example, if COFF rules are different than Mach-O rules, there should be a different set of compile errors for these formats.

Additionally, there could be a future object format, for a not-yet-existing operating system, that allows null bytes in symbols and length zero symbol names. Zig should support such a hypothetical target. Therefore it should not forbid this at the language layer.

avoid bugs with third party tools which store zig identifiers as null-terminated strings

Unfortunately, while addressing this is desirable, it takes a backseat to other concerns mentioned in this proposal.

allows for a more efficient representation of identifier names as strings within zig compilers

If the language allows empty identifier names and null bytes in identifier names, but every supported target object format does not, then the Zig compiler can emit a suitable compile error in the lowering of AST. Just because the language allows it, does not mean it will succeed compilation on every target.

This way the zig compiler can still use null-termination in the later stages of compilation, still have correct error reporting, and yet the language specification would not forbid this, allowing ZON to take advantage of this, as well as a hypothetical future object format.

If this is accepted, the only thing needed to be done to implement this issue is likely to make slight adjustments to the wording of relevant compile errors.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Feb 3, 2023
@andrewrk andrewrk added this to the 0.11.0 milestone Feb 3, 2023
@nektro
Copy link
Contributor

nektro commented Feb 3, 2023

Even JSON has the ability to represent null bytes in map keys.

Only through "\u0000" or "\0", not an actual nul byte

@andrewrk
Copy link
Member Author

andrewrk commented Feb 4, 2023

Based on your comment, it sounds like you think I am suggesting to allow null bytes in zig source code, as opposed what this proposal is actually proposing, which is that the following JSON and ZON would be equivalent:

{ "A\u0000Z": true }
.{ .@"A\x00Z" = true }

Status quo:

test.zig:6:12: error: identifier cannot contain null bytes
    var x: @"A\x00Z" = true;
           ^~~~~~~~~

@andrewrk andrewrk changed the title allow \x00 (null bytes) in @"" identifier syntax in the language specification allow \x00 (null bytes) and empty string in @"" identifier syntax in the language specification Mar 18, 2023
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Jul 20, 2023
@andrewrk andrewrk added the accepted This proposal is planned. label Apr 7, 2024
@MasonRemaley MasonRemaley mentioned this issue Jun 12, 2024
4 tasks
@andrewrk andrewrk modified the milestones: 0.14.0, 0.15.0 Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

2 participants