Ascii #2171

shawnl · 2019-04-02T19:57:47Z

This removes the IgnoreCharacter version of Base64.

It also adds a ascii.isZIg function to be used by stage2. (I was also working on a vectorized version)

hryx · 2019-04-02T20:47:00Z

std/ascii.zig

+    Digit, // '0'...'9'
+    Lower, // 'a'...'z'
+    Upper, // 'A'...'Z'
+    Punct, // ASCII and !DEL and !AlNum


Nitpick: I recommend not making abbreviations like Punctuation/Control -> Punct/Cntrl — it may make it unnecessarily more difficult for non-native english speakers to understand the source.

These abbreviations come from the C99 standard.

Ah, my mistake 🎩

On second thought, I still stand by my original suggestion because these are not constants which map to any C code values (from what I can tell). I feel like abbreviations like that are usually only good when you are exposing/wrapping around an already defined C value like EPERM, whereas a newly defined value should probably be ErrorNotPermitted. Anyway, not to get too sucked into a style discussion, I'm done :)

daurnimator · 2019-04-03T14:54:16Z

Why remove support for ignoring characters? e.g. base64 code often gets hard-word-wrapped (see e.g. PEM encoded certificates)

shawnl · 2019-04-03T16:44:54Z

It doesn't support comptime either due to limitations in the compiler. I will save that patch for later.

jayschwa · 2019-04-06T05:04:01Z

Why remove support for ignoring characters?

I think it would be cleaner to have that functionality separate, but in a way that could be composed with any decoder.

andrewrk · 2019-04-15T21:44:52Z

std/ascii.zig

+    return inTable(c, tIndex.Blank);
+}
+
+pub fn isZig(c: u8) bool {


What is this? Can you document this better?

It is documented elsewhere in the file, but I pushed some doc friendly documentation. I was using it in the self-hosted compiler to validate zig source code encoding. (plus other stuff to validate the utf-8/unicode)

andrewrk · 2019-04-15T21:45:35Z

std/fmt.zig


+    const value = swtch[c];


optimizations must come with tests

#2128 (comment)

Which was in the commit description. I feel some of the problems come from github's interface. I am splitting things into different commits for a reason.

Also, the way you are lowering ranged switch statements inside zig means that llvm can never optimize these sort of things. (even if it doesn't well currently)

Tested against glibc.

benchmarks are here ziglang#2128 (comment)

isspace considers a few more white space characters that were not considered (and are not valid in zig code, so will have no effect).

unless I am missing something it appears that the self-hosted compiler was not compliant as it did not take upper case hex digits

andrewrk · 2019-04-25T00:59:17Z

std/ascii.zig

+/// see doc/langref.html.in online at https://ziglang.org/documentation/master/#Source-Encoding
+/// Does not validate UTF-8 or check for prohibited Unicode code-points,
+/// is why it is called isntZig() rather than isZig().
+pub fn isntZig(c: u8) bool {


What's the reasoning behind adding this? It is surprising that there would be std.ascii.isntZig. What is the intended usage? Self-hosted tokenizer? What would a different language tokenizer be expected to do, since there wouldn't be, for example, std.ascii.java, std.ascii.perl, etc.?

Self-hosted tokenizer?

Yes.

Putting it here saves 256 bytes. It isn't much, but it is something. If it is too ugly, then so be it. Zig is unlikely to ever tokenize java or perl, and no other language I know of has character requirements quite like zig. for C you need to support tri-graphs for example.

Also, by putting this stuff together, a future vectored streaming version can all use the same code.

i don't see a reason to expose this in the stdlib. if it's for parsing zig it should be a private implementation detail in std.zig.parse.

andrewrk · 2019-04-29T17:40:53Z

There is too much unrelated stuff in this pull request, and it introduces this "is/isnt zig" API that doesn't seem to belong in this module. I'm starting to get a lot of pull requests to zig, and so I need them to become easier for me to review/edit/merge. This pull request is Too Hard for me to review/edit/merge, and so I'm going to close it.

You are welcome to open a new pull request with the following criteria:

The PR description describes everything that the PR does. Nothing is in the code changes that isn't mentioned in the description.
If it changes behavior, it comes with tests, or otherwise explains why automated tests are impractical, and explains what testing was performed.
If it deals with performance, it comes with benchmarks & timing outputs in the description. A good example of that is here: Add Sha2 functions #687
Before you make the pull request, review your own code for mistakes. Try to save me, and others, some time here. Try to predict what I'm going to say, and fix it ahead of time, or address it in the description.

I will also reiterate my suggestion from IRC: I think you want to go fast and write some exploratory code. That's great. I think you should maintain a fork of zig with all your experiments. Periodically you could demo some cool stuff that your fork is capable of that upstream is not, and entice me, or others, to upstream some of your code.

However it's not going to be possible for you to go as fast as you want to go, directly in upstream Zig. You're going to have to meet the criteria outlined above.

hryx reviewed Apr 2, 2019

View reviewed changes

shawnl force-pushed the ascii branch 4 times, most recently from 26281e5 to 1f97324 Compare April 3, 2019 14:30

shawnl force-pushed the ascii branch from 1f97324 to b606616 Compare April 3, 2019 16:44

andrewrk self-requested a review April 3, 2019 16:47

shawnl force-pushed the ascii branch 4 times, most recently from 3571ab3 to 35dbe50 Compare April 10, 2019 12:38

shawnl changed the title ~~Ascii and Base64~~ Ascii Apr 10, 2019

shawnl force-pushed the ascii branch 4 times, most recently from 2c8cb30 to 9bdc69e Compare April 11, 2019 02:26

andrewrk requested changes Apr 15, 2019

View reviewed changes

shawnl added 5 commits April 16, 2019 19:37

expand std.ascii, add std.ascii.isZig()

cff8658

Tested against glibc.

optimize fmt.charToDigit

228facd

benchmarks are here ziglang#2128 (comment)

use std.ascii.isSpace() in fmt

1de7517

isspace considers a few more white space characters that were not considered (and are not valid in zig code, so will have no effect).

use optimized charToDigit in bigint code

5d37635

unless I am missing something it appears that the self-hosted compiler was not compliant as it did not take upper case hex digits

Documentation for std.ascii

cb3867a

shawnl force-pushed the ascii branch from 9bdc69e to cb3867a Compare April 17, 2019 00:46

invert isZig() to isntZig()

4422f7a

andrewrk reviewed Apr 25, 2019

View reviewed changes

andrewrk added the work in progress This pull request is not ready for review yet. label Apr 25, 2019

andrewrk closed this Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ascii #2171

Ascii #2171

shawnl commented Apr 2, 2019

hryx Apr 2, 2019

shawnl Apr 2, 2019

hryx Apr 2, 2019

hryx Apr 2, 2019

daurnimator commented Apr 3, 2019

shawnl commented Apr 3, 2019

jayschwa commented Apr 6, 2019 •

edited

Loading

andrewrk Apr 15, 2019

shawnl Apr 17, 2019

andrewrk Apr 15, 2019

shawnl Apr 17, 2019

andrewrk Apr 25, 2019

shawnl Apr 25, 2019

shawnl Apr 25, 2019

shawnl Apr 25, 2019

emekoi Apr 25, 2019 •

edited

Loading

andrewrk commented Apr 29, 2019

Ascii #2171

Ascii #2171

Conversation

shawnl commented Apr 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daurnimator commented Apr 3, 2019

shawnl commented Apr 3, 2019

jayschwa commented Apr 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emekoi Apr 25, 2019 • edited Loading

Choose a reason for hiding this comment

andrewrk commented Apr 29, 2019

jayschwa commented Apr 6, 2019 •

edited

Loading

emekoi Apr 25, 2019 •

edited

Loading