-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EBNF #199
Add EBNF #199
Conversation
👍 |
I want this to be in before TOML hits 1.0. I would like to first go through and try to verify it myself first. I'll probably save this until we've settled on other changes (multiline/raw strings, tuples, string keys, etc.). |
👍 |
This looks like a good start on the low-level parts. But IIUC For instance the README provides examples that it says are supposed to be errors (such as defining tables in a certain order) that aren't reflected in this grammar, yet. |
@greghendershott |
Although I don't normally use EBNF, I did understand about
I only just started to learn about toml today. If I already understood toml crisply, I'd just go ahead and propose the exact additional rules. Unfortunately I was kind of hoping to find the grammar already done to help me understand toml in the first place. So it's a bit chicken-and-egg, and I'm sorry for not being more specific and helpful. |
@greghendershott I haven't looked at the grammar too closely yet, but I understand it to be pretty old. I imagine it will require some attention. I don't think it includes anything that was added in
That's not right. This is valid TOML: [a.b]
k1 = 5
[a]
k2 = 5 This is invalid: [a.b]
k1 = 5
[a]
b = 5 So is this: [a]
b = 5
[a.b]
k1 = 5 In both invalid cases, the This part of TOML won't be captured in a grammar. It's an environment issue. If you want to help out with the EBNF, that'd be a greatly appreciated. :-) |
Well, I'd be happy to help with the grammar once I understand what it is. :) I spent some time today writing a TOML parser in Racket. The main thing I found confusing was nested tables and (nested) arrays-of-tables. I misunderstood that the README was talking about the order of I do think the grammar could describe the following stuff. (Not EBNF, just using
I think the |
I'm running out of time. At a glance, that looks OK, so long as The only ordering in TOML that matters is that [table]
key = 5
[[table.array]]
a = 1
b = 2
[another table]
key = 10
[[table.array]]
a = 2
b = 4 Works just fine and maps roughly to this JSON (sorry about the type annotations, it's an artifact of toml-test): {
"another table": {
"key": {
"type": "integer",
"value": "10"
}
},
"table": {
"array": [
{
"a": {
"type": "integer",
"value": "1"
},
"b": {
"type": "integer",
"value": "2"
}
},
{
"a": {
"type": "integer",
"value": "2"
},
"b": {
"type": "integer",
"value": "4"
}
}
],
"key": {
"type": "integer",
"value": "5"
}
}
} |
If you want to play around with converting TOML to JSON, then I think this will do the trick (assuming you have Go installed): [andrew@Liger ~] export GOPATH="$HOME/gotmp"
[andrew@Liger ~] go get github.com/BurntSushi/toml/cmd/toml-test-decoder
[andrew@Liger ~] $HOME/gotmp/bin/toml-test-decoder < test.toml | python -m json.tool Fill in your own |
Ah, OK. I thought the order was significant. That's important to know. (My parsing model was trying to neatly accumulate and functionally merge immutable hash tables, which depends on an array-of-tables having any kids grouped "in" it. Instead, I think from what you're saying, I'll need to mutate the heck out of a global hash table, as items may appear in whatever order. Oh well. Sleep, then back to the drawing board.) Sorry to take so much of your time! Your guidance has been really helpful. Thank you! Hopefully I will be able to repay it in the form of a Racket paser, and something for the grammar that may help other folks. |
No worries, thanks for taking a look. :-) Mutating a global hash table sounds like you're on the right track. I think there is a Haskell parser somewhere (not sure if it is maintained) that might be useful to study for building a Racket parser. |
Although I'm still working on this, one update/comment: Turns out the grammar needs to talk about some ordering with respect to nested array-of-tables. From the README (emphasis mine):
Example from the README:
My feeling at the moment: 0.10 was in the spirit of INI files; great, easy. Adding nested arrays in 0.20 takes this to another level -- still reasonable to parse with a clear grammar (like what I'd wrongly assumed above). But without a clear grammar... it doesn't fit so neatly on top of the "things in whatever order" spirit of 0.10. Bit of tension between the two ideas. Or maybe I'm just being dense; certainly I'm no parsing expert. I wonder how many TOML parsers and tests actually implement nested arrays. |
That's a good point, but it is part of my test suite: table-array-nest.toml. And the equivalent JSON. So any parser that passes my test suite should also handle that case correctly. Again, I don't think this has to be represented in the grammar. A grammar doesn't need to be strong enough to guarantee that every production is a valid TOML document. A grammar is supposed to specify the syntax and not much more. For example, a grammar would probably not catch this as an invalid value: |
See toml-lang/toml#199 (comment) Tables and arrays of tables may come in ANY order. Will need to redesign. But committing this as a reference checkpoint.
Closing in favor of the more complete ABNF proposal in #236. |
I believe it is essential for TOML to have a formal EBNF specification to avoid ambiguities.