Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EBNF #199

Closed
wants to merge 1 commit into from
Closed

Add EBNF #199

wants to merge 1 commit into from

Conversation

88Alex
Copy link

@88Alex 88Alex commented Sep 7, 2013

I believe it is essential for TOML to have a formal EBNF specification to avoid ambiguities.

@BurntSushi
Copy link
Member

👍

@BurntSushi BurntSushi mentioned this pull request Jun 22, 2014
@BurntSushi
Copy link
Member

I want this to be in before TOML hits 1.0. I would like to first go through and try to verify it myself first. I'll probably save this until we've settled on other changes (multiline/raw strings, tuples, string keys, etc.).

@pnathan
Copy link

pnathan commented Jun 26, 2014

👍

@greghendershott
Copy link

This looks like a good start on the low-level parts. But IIUC toml isn't a complete "document", it seems to be one "expression"? How about the grammar for a series of key/value pairs, tables, and arrays of tables?

For instance the README provides examples that it says are supposed to be errors (such as defining tables in a certain order) that aren't reflected in this grammar, yet.

@88Alex
Copy link
Author

88Alex commented Jun 26, 2014

@greghendershott {} actually mean that whatever is inside them repeats. This may be the source of this misunderstanding.

@greghendershott
Copy link

Although I don't normally use EBNF, I did understand about {} for repetition. What I meant is, should there also be rules for things like:

  • a complete toml_document?
  • higher-level items like table, array_of_tables?
  • keys for both table and array_of_tables items -- which can be . separated nested keys e.g. [grandparent.parent.child] or [[grandparent.parent.child]]?
  • also, the README has some examples like [a.b] must follow [a], which I don't completely understand the rationale for, but a full grammar might help clear up?

I only just started to learn about toml today. If I already understood toml crisply, I'd just go ahead and propose the exact additional rules. Unfortunately I was kind of hoping to find the grammar already done to help me understand toml in the first place. So it's a bit chicken-and-egg, and I'm sorry for not being more specific and helpful.

@BurntSushi
Copy link
Member

@greghendershott I haven't looked at the grammar too closely yet, but I understand it to be pretty old. I imagine it will require some attention. I don't think it includes anything that was added in 0.2 (e.g., table arrays).

also, the README has some examples like [a.b] must follow [a], which I don't completely understand the rationale for, but a full grammar might help clear up?

That's not right. This is valid TOML:

[a.b]
k1 = 5

[a]
k2 = 5

This is invalid:

[a.b]
k1 = 5

[a]
b = 5

So is this:

[a]
b = 5

[a.b]
k1 = 5

In both invalid cases, the b key is defined twice.

This part of TOML won't be captured in a grammar. It's an environment issue.

If you want to help out with the EBNF, that'd be a greatly appreciated. :-)

@greghendershott
Copy link

@BurntSushi

Well, I'd be happy to help with the grammar once I understand what it is. :)

I spent some time today writing a TOML parser in Racket. The main thing I found confusing was nested tables and (nested) arrays-of-tables.

I misunderstood that the README was talking about the order of [a] and [a.b] generally. It's not. Instead it's about giving b conflicting definitions in the examples. Got it now -- thanks!


I do think the grammar could describe the following stuff. (Not EBNF, just using ... for zero-or-more).

toml-document = group ...
group = key/value ... table ... array-of-tables ...
table = [keys] key/value ...
array-of-tables = aot-item ...
aot-item = [[keys]] group ...

I think the group rule is correct? If correct, helpful to state that. But I'm not sure it's correct.

@BurntSushi
Copy link
Member

I'm running out of time. At a glance, that looks OK, so long as a ... b ... c ... doesn't imply anything about the order of a, b and c.

The only ordering in TOML that matters is that key = value forms always belong to the previous [table] or [[aot]]. But otherwise, [...] and [[...]] can appear anywhere. e.g.,

[table]
key = 5

[[table.array]]
a = 1
b = 2

[another table]
key = 10

[[table.array]]
a = 2
b = 4

Works just fine and maps roughly to this JSON (sorry about the type annotations, it's an artifact of toml-test):

{
    "another table": {
        "key": {
            "type": "integer",
            "value": "10"
        }
    },
    "table": {
        "array": [
            {
                "a": {
                    "type": "integer",                                                                                                                                                         
                    "value": "1"                                                                                                                                                               
                },                                                                                                                                                                             
                "b": {                                                                                                                                                                         
                    "type": "integer",                                                                                                                                                         
                    "value": "2"                                                                                                                                                               
                }                                                                                                                                                                              
            },                                                                                                                                                                                 
            {                                                                                                                                                                                  
                "a": {                                                                                                                                                                         
                    "type": "integer",                                                                                                                                                         
                    "value": "2"                                                                                                                                                               
                },                                                                                                                                                                             
                "b": {                                                                                                                                                                         
                    "type": "integer",                                                                                                                                                         
                    "value": "4"                                                                                                                                                               
                }                                                                                                                                                                              
            }                                                                                                                                                                                  
        ],                                                                                                                                                                                     
        "key": {                                                                                                                                                                               
            "type": "integer",                                                                                                                                                                 
            "value": "5"                                                                                                                                                                       
        }
    }
}

@BurntSushi
Copy link
Member

If you want to play around with converting TOML to JSON, then I think this will do the trick (assuming you have Go installed):

[andrew@Liger ~] export GOPATH="$HOME/gotmp"
[andrew@Liger ~] go get github.com/BurntSushi/toml/cmd/toml-test-decoder
[andrew@Liger ~] $HOME/gotmp/bin/toml-test-decoder < test.toml | python -m json.tool

Fill in your own test.toml with whatever TOML you want.

@greghendershott
Copy link

@BurntSushi

At a glance, that looks OK, so long as a ... b ... c ... doesn't imply anything about the order of a, b and c.

The only ordering in TOML that matters is that key = value forms always belong to their nearest [table] or [[aot]]. But otherwise, [...] and [[...]] can appear anywhere

Ah, OK. I thought the order was significant. That's important to know. (My parsing model was trying to neatly accumulate and functionally merge immutable hash tables, which depends on an array-of-tables having any kids grouped "in" it. Instead, I think from what you're saying, I'll need to mutate the heck out of a global hash table, as items may appear in whatever order. Oh well. Sleep, then back to the drawing board.)

Sorry to take so much of your time! Your guidance has been really helpful. Thank you! Hopefully I will be able to repay it in the form of a Racket paser, and something for the grammar that may help other folks.

@BurntSushi
Copy link
Member

No worries, thanks for taking a look. :-)

Mutating a global hash table sounds like you're on the right track. I think there is a Haskell parser somewhere (not sure if it is maintained) that might be useful to study for building a Racket parser.

@greghendershott
Copy link

Although I'm still working on this, one update/comment: Turns out the grammar needs to talk about some ordering with respect to nested array-of-tables. From the README (emphasis mine):

You can create nested arrays of tables as well. Just use the same double bracket syntax on sub-tables. Each double-bracketed sub-table will belong to the most recently defined table element above it.

Example from the README:

[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

My feeling at the moment: 0.10 was in the spirit of INI files; great, easy. Adding nested arrays in 0.20 takes this to another level -- still reasonable to parse with a clear grammar (like what I'd wrongly assumed above). But without a clear grammar... it doesn't fit so neatly on top of the "things in whatever order" spirit of 0.10. Bit of tension between the two ideas. Or maybe I'm just being dense; certainly I'm no parsing expert. I wonder how many TOML parsers and tests actually implement nested arrays.

@BurntSushi
Copy link
Member

That's a good point, but it is part of my test suite: table-array-nest.toml. And the equivalent JSON. So any parser that passes my test suite should also handle that case correctly.

Again, I don't think this has to be represented in the grammar. A grammar doesn't need to be strong enough to guarantee that every production is a valid TOML document. A grammar is supposed to specify the syntax and not much more.

For example, a grammar would probably not catch this as an invalid value: [1, "hi"]. (It's a type error not a syntax error.)

greghendershott pushed a commit to greghendershott/toml that referenced this pull request Jun 29, 2014
See toml-lang/toml#199 (comment)

Tables and arrays of tables may come in ANY order. Will need to
redesign. But committing this as a reference checkpoint.
@mojombo
Copy link
Member

mojombo commented Jul 18, 2014

Closing in favor of the more complete ABNF proposal in #236.

@mojombo mojombo closed this Jul 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants