Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.json: support field aliases #8987

Closed
wants to merge 10 commits into from

Conversation

travisstaloch
Copy link
Contributor

@travisstaloch travisstaloch commented Jun 4, 2021

  • if a struct T passed to parse(T, ...) contains a pub const __tags of the form
    shown below, these aliases will be used as the json keys instead of the
    field name.
  • adds std.mem.{indexOfSliceScalar, indexOfSliceScalarPos} which support
    finding aliases. adding these was my workaround to a compiler bug
    encountered when using a for loop for the same purpose.
    const S = struct {
        value: u8,
        pub const __tags = .{
            .value = .{ .alias = "__value" },
        };
    };

The names __tags and alias were chosen by me ad-hoc and may be easily
changed.
Inspired by #1099 #1099 (comment)

- if a struct T passed to parse(T) contains a pub const __tags of the form
  shown below, these aliases will be used as the json keys instead of the
  field name.
- adds std.mem.{indexOfSliceScalar, indexOfSliceScalarPos} which support
  finding aliases. adding these was my workaround to a compiler bug
  encountered when using a for loop for the same purpose.

The names __tags and alias were chosen by me ad-hoc and may be easily
changed.
@travisstaloch
Copy link
Contributor Author

@SpexGuy suspects that this is failing due to significant comptime usage and may have to wait for stage 2.

@travisstaloch
Copy link
Contributor Author

travisstaloch commented Jun 4, 2021

I messed up my rebase in previous pr. Forgive me as I've closed it and reopened this after mikdusan suggested that the real issue was needing a rebase. Maybe this will pass ci now...

@andrewrk
Copy link
Member

andrewrk commented Jun 4, 2021

Sorry about that. With self hosted you should be able to comptime to your heart's content. I am confident we can ship it by zig 0.9.0.

Copy link
Contributor

@daurnimator daurnimator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the best way to accomplish this feature: I was thinking it would be better to provide on a per parse/stringify invocation via an option that contains a map of overrides

@travisstaloch
Copy link
Contributor Author

@andrewrk looking forward to 0.9 when comptime will go brrr 👍.

Should i leave this pr open then after everything else is resolved? Or close it and plan to re-open / make a new pr after 0.9?

@andrewrk
Copy link
Member

andrewrk commented Jun 4, 2021

I'd like to reserve the PR queue for "ready to be reviewed & merged". If something can't be merge-ready due to being blocked by other (not immediately solvable) issues, then it should be an issue, and then resurrected as a PR when the time comes.

@travisstaloch
Copy link
Contributor Author

Sounds good. I opened #8993 as a reminder to revisit after 0.9. Will close this once other issues with this pr are resolved.

@travisstaloch
Copy link
Contributor Author

@daurnimator i made some changes. i'm not sure what needs to be done to support escapes. are you saying that the escaped version of stringToken.slice(tokens.slice, tokens.i - 1) should be escaped before checking for a matching alias? any advice how i could do that?

@daurnimator
Copy link
Contributor

i'm not sure what needs to be done to support escapes. are you saying that the escaped version of stringToken.slice(tokens.slice, tokens.i - 1) should be escaped before checking for a matching alias? any advice how i could do that?

I'm saying you need to make sure {"foo": "bar"} and {"f\x6fo":"bar"} are treated the same.
See the line just below where you edited for how we do that:

(stringToken.escapes == .None and mem.eql(u8, field.name, key_source_slice)) or (stringToken.escapes == .Some and (field.name.len == stringToken.decodedLength() and encodesTo(field.name, key_source_slice)))

i.e. if stringToken.escapes == .None we can match against the field name directly; otherwise we need to use a different equality check. We have the encodesTo for such situations.

Note that to do this you'll want to change your implementation to do most of the work inside the inline for (structInfo.fields) |field, i| { loop


Also note typo in your commit message: s/PaseOptions/ParseOptions

- introduces `ParseOptions.field_aliases` as a `[]const [2][]const u8`.
  usage becomes: `.{ .field_aliases = &.{.{ "value", "__value"  }}  }`
- buildFieldAliases() now returns a `StringHashMapUnmanaged([]const u8)` and
  no longer requires excessive comptime memory.
@travisstaloch
Copy link
Contributor Author

travisstaloch commented Jun 8, 2021

thank you 👍. so i added this test which i expected to fail, but its passing:

test "field alias escapes" {
    const S = struct { foo: u8 };
    const text =
        "{\"__f\x6fo\": 42}";

    const s = try parse(S, &TokenStream.init(text), .{
        .field_aliases = &.{.{ "foo", "__foo" }},
        .allocator = std.testing.allocator,
    });
    try testing.expectEqual(@as(u8, 42), s.foo);
}

not sure what to make of this... 🤔

am i testing the wrong thing? i tried this which also passed:

test "field alias escapes" {
    const S = struct { foo: u8 };
    const text =
        "{\"__foo\": 42}";

    const s = try parse(S, &TokenStream.init(text), .{
        .field_aliases = &.{.{ "foo", "__f\x6fo" }},
        .allocator = std.testing.allocator,
    });
    try testing.expectEqual(@as(u8, 42), s.foo);
}

can you think of a test case which i should include to make this fail?

@daurnimator
Copy link
Contributor

@travisstaloch you have 2 layers of escaping: \x6f is already understood by zig, you need to use zig \\ strings or write "\\x6f"

@travisstaloch
Copy link
Contributor Author

travisstaloch commented Jun 8, 2021

@travisstaloch you have 2 layers of escaping: \x6f is already understood by zig, you need to use zig \\ strings or write "\\x6f"

Ok, well I was initially using multiline escapes, but ran into this problem. Turns out std.json can't even parse this and is giving the same error i was seeing:

const std = @import("std");

test "field alias escapes" {
    const S = struct { foo: u8 };
    const text =
        \\{"f\x6fo": 42}
    ;

    const s = try std.json.parse(S, &std.json.TokenStream.init(text), .{});
    try std.testing.expectEqual(@as(u8, 42), s.foo);
}

gives error:

$ zig version
0.9.0-dev.21+7462b0e5b

$ zig test /tmp/test.zig 
test "field alias escapes"... FAIL (InvalidEscapeCharacter)
~/Documents/Code/zig/zig/build/lib/zig/std/json.zig:753:21: 0x211714 in std.json.StreamingParser.transition (test)
                    return error.InvalidEscapeCharacter;
                    ^
~/Documents/Code/zig/zig/build/lib/zig/std/json.zig:284:13: 0x20bc86 in std.json.StreamingParser.feed (test)
        if (try p.transition(c, token1)) {
            ^
~/Documents/Code/zig/zig/build/lib/zig/std/json.zig:1122:13: 0x20abf0 in std.json.TokenStream.next (test)
            try self.parser.feed(self.slice[self.i], &t1, &t2);
            ^
~/Documents/Code/zig/zig/build/lib/zig/std/json.zig:1628:26: 0x20b089 in std.json.parseInternal (test)
                switch ((try tokens.next()) orelse return error.UnexpectedEndOfJson) {
                         ^
~/Documents/Code/zig/zig/build/lib/zig/std/json.zig:1795:15: 0x20985e in std.json.parse (test)
    const r = try parseInternal(T, token, tokens, options);
              ^
/tmp/test.zig:9:15: 0x207d31 in test "field alias escapes" (test)
    const s = try std.json.parse(S, &std.json.TokenStream.init(text), .{});

is this an issue with std.json or an invalid input? maybe i'm missing something?

@daurnimator
Copy link
Contributor

daurnimator commented Jun 8, 2021

is this an issue with std.json or an invalid input? maybe i'm missing something?

was just me being wrong :P json doesn't support \x escapes. Instead of \x6f try \u006f

@travisstaloch
Copy link
Contributor Author

travisstaloch commented Jun 8, 2021

Ok good. So does this look like the test i'm trying to make pass? I want to be sure i'm testing the right thing.

test "field alias escapes" {
    const S = struct { foo: u8 };
    const text =
        \\{"__f\u006fo": 42}
    ;
    const s = try parse(S, &TokenStream.init(text), .{
        .field_aliases = &.{.{ .field = "foo", .alias = 
        \\__f\u006fo 
        }},
        .allocator = std.testing.allocator,
    });
    try testing.expectEqual(@as(u8, 42), s.foo);
}

@daurnimator
Copy link
Contributor

no you want to set the alias to just __foo. This should pass:

test "field alias escapes" {
    const S = struct { foo: u8 };
    const text =
        \\{"__f\u006fo": 42}
    ;
    const s = try parse(S, &TokenStream.init(text), .{
        .field_aliases = &.{.{ .field = "foo", .alias =  "__foo" }},
        .allocator = std.testing.allocator,
    });
    try testing.expectEqual(@as(u8, 42), s.foo);
}

@daurnimator daurnimator added the standard library This issue involves writing Zig code for the standard library. label Jun 8, 2021
- now using correct escape sequence and key
- attempt to reformat and add  a few comments to
  `inline for (structInfo.fields) |field, i|` loop body
@daurnimator daurnimator dismissed their stale review June 8, 2021 11:44

comments addressed

- now maps from field_name => alias_name
- this unifies and improves searching for aliases.
- this also fixes missing check for field.name equality.
- add test coverage: false positive matches
Copy link
Contributor

@daurnimator daurnimator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need an allocator: you can bound the size of the field aliases by the number of fields in the type, which means it can be on the stack.

lib/std/json.zig Outdated
@@ -1466,6 +1466,7 @@ pub const ParseOptions = struct {
ignore_unknown_fields: bool = false,

allow_trailing_data: bool = false,
field_aliases: ?[]const struct { field: []const u8, alias: []const u8 } = null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be null-able: just use the empty list.

lib/std/json.zig Outdated
@@ -1466,6 +1466,7 @@ pub const ParseOptions = struct {
ignore_unknown_fields: bool = false,

allow_trailing_data: bool = false,
field_aliases: ?[]const struct { field: []const u8, alias: []const u8 } = null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line between fields here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all of these in latest commit.

@travisstaloch
Copy link
Contributor Author

I don't think you need an allocator: you can bound the size of the field aliases by the number of fields in the type, which means it can be on the stack.

the allocator is only needed for the StringHashMapUnmanaged returned by buildFieldAliases(). how could i initialize this on the stack? just pass in a fixedBufferAllocator?

@travisstaloch
Copy link
Contributor Author

this makes the tests pass. not sure if structInfo.fields.len * 512 its a good hueristic. its the smallest power of 2 that passes. does this seem acceptable?

            var buf: [structInfo.fields.len * (1 << 9)]u8 = undefined;
            var fba = std.heap.FixedBufferAllocator.init(&buf);
            var field_alias_map = try buildFieldAliasMap(T, options, &fba.allocator);

@travisstaloch
Copy link
Contributor Author

continuous-integration/drone/pr and ziglang.zig ci jobs both passed before i got tired waiting to see if 3rd would pass. going to push my latest commit now.

- use heap.fixedBufferAllocator with size structInfo.fields.len * 512 rather then user allocator
- get rid of tests expecting error.AllocatorRequired
- add test "many field aliases"
- make FieldAlias struct pub and add some doc comments
lib/std/json.zig Outdated
// build a map of aliases from field_name => alias_name
fn buildFieldAliasMap(comptime T: type, options: ParseOptions, allocator: *mem.Allocator) !std.StringHashMapUnmanaged([]const u8) {
var result = std.StringHashMapUnmanaged([]const u8){};
if (options.field_aliases.len == 0) return result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this line I think?

lib/std/json.zig Outdated
@@ -1623,6 +1710,9 @@ fn parseInternal(comptime T: type, token: Token, tokens: *TokenStream, options:
}
}
}
var buf: [structInfo.fields.len * (1 << 9)]u8 = undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the limit of 512 is appropriate; I was more thinking something like iterating over the fields (with an inline for) and calculating the actual max size of the hash table possible (if there is e.g. 5 fields, then the hash table can't have more than 5 entries).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thats what i was thinking too initially. but i'm not sure how to do it. can you provide an example of how to initialize a hash table with only N entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i found a solution. will push a commit soon.

- allocates backing data on the stack
- much smaller memory requirement (around 40 bytes per field)
- removes FixedBufferAllocator
- adds error.TooManyFieldAliases check
- 8 is ArrayHashMap's linear_scan_max.  if there are more than 8
  entries, an index is allocated and used.  otherwise a get() does
  a linear scan and no allocations occur.
- use FixedBufferAllocator with capacity 9 * field_names.len bytes.
  9 is the lowest multiple which doesn't OOM.  i tested this with
  up to 2048 struct fields.
@travisstaloch
Copy link
Contributor Author

no idea why the freebsd build failed. i can't tell from the logs.

@Vexu
Copy link
Member

Vexu commented Jun 12, 2021

Timed out, but it seems like all the relevant tests were run.

@travisstaloch
Copy link
Contributor Author

is there anything left to do on this or is it just waiting in line? just wanted to ping @daurnimator as there has been no action in a couple weeks. note that the most recent freebsd ci job seems to have run all relevent tests but just timed out for some reason. also know that it shouldn't be a problem for me to rebase as happened previously.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the idea behind this to do the aliasing at compile-time instead of runtime?

is there anything left to do on this or is it just waiting in line?
* if a struct T passed to parse(T, ...) contains a pub const __tags of the form
shown below, these aliases will be used as the json keys instead of the
field name.

I don't see this in the code

@travisstaloch
Copy link
Contributor Author

In response to @andrewrk questions:

Isn't the idea behind this to do the aliasing at compile-time instead of runtime?

I'm not sure if this is currently possible. In this PR's first commit i had created the field aliases as a list of from_field, to_field pairs at comptime. However if memory serves, this caused an OOM error.

The aliases are now stored as an ArrayHashMapUnmanaged which is created with stack memory. This was done in an attempt to avoid allocations.

I don't see this in the code

The field_aliases were moved from a __tags struct decl to ParseOptions.field_aliases as requested by @daurnimator i believe.

travisstaloch added a commit to travisstaloch/zig that referenced this pull request Nov 21, 2021
- this is a rework of ziglang#8987.  rebasing that old pr proved difficult so it
  was easier to just start over.
- this patch has the same api and tests as ziglang#8987 but uses an enums.EnumMap
  rather than StringArrayHashMap.  this allows `field_alias_map` to be
  made entirely on the stack and requires no allocator (not even a
  FixedBufferAllocator which the previous used for re-indexing its map).
@travisstaloch
Copy link
Contributor Author

Closing in favor of #10193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants