Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ABNF description of TOML. #236

Merged
merged 23 commits into from
Jan 4, 2017
Merged
Changes from 9 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
fcd0229
Add ABNF description of TOML.
mojombo Jul 17, 2014
cb50c72
Properly specify unquoted-key.
mojombo Jul 18, 2014
7b439f6
Fix whitespace in array-table-close.
mojombo Jul 18, 2014
bd4fe79
Update ABNF with v0.3.0 compliant integer/float rules.
mojombo Nov 10, 2014
4d15ea5
Update ABNF with RFC 3339 datetime spec to be v0.3.0 compliant.
mojombo Nov 10, 2014
52dc64a
Update for newline clarifications.
mojombo Dec 17, 2014
3808902
Add \UXXXXXXXX escape sequence.
mojombo Jan 7, 2015
0cee090
Allow underscores in int/float ABNF.
mojombo Feb 7, 2015
532a466
Small style fixes.
mojombo Feb 10, 2015
bb00057
Added grouping to ambiguous alternatives
joelself Dec 31, 2015
863f5cc
Allow leading newlines inside arrays
joelself Jan 2, 2016
d6c49b4
Don't need extra newlines before closing brace
joelself Jan 2, 2016
216f642
Allow ws after newline and before element
joelself Jan 2, 2016
64c0a6a
Allow newline and whitespace after , before comment
joelself Jan 2, 2016
75f6ba3
Adjusted the abnf to allow for whitespaces
joelself Jan 30, 2016
9882e88
Merge pull request #378 from joelself/abnf
mojombo Jan 4, 2017
92f20e5
Add ABNF work-in-progress warning.
mojombo Jan 4, 2017
eb703d2
Fix array grouping.
mojombo Jan 4, 2017
a2c74ce
Add ABNF for Date and Time.
mojombo Jan 4, 2017
7c0db2c
RFC 5234 is latest for ABNF.
mojombo Jan 4, 2017
f9d4429
Make ABNF compatible with a real ABNF parser.
mojombo Jan 4, 2017
514037d
Reorder ABNF rules for better clarity.
mojombo Jan 4, 2017
ecb8274
Delete unused rule.
mojombo Jan 4, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions toml.abnf
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
;; This is an attempt to define TOML in ABNF according to the grammar defined
;; in RFC 4234 (http://www.ietf.org/rfc/rfc4234.txt).

;; TOML

toml = expression *( newline expression )
expression = (
ws /
ws comment /
ws keyval ws [ comment ] /
ws table ws [ comment ]
)

;; Newline

newline = (
%x0A / ; LF
%x0D.0A ; CRLF
)

newlines = 1*newline

;; Whitespace

ws = *(
%x20 / ; Space
%x09 ; Horizontal tab
)

;; Comment

comment-start-symbol = %x23 ; #
non-eol = %x09 / %x20-10FFFF
comment = comment-start-symbol *non-eol

;; Key-Value pairs

keyval-sep = ws %x3D ws ; =
keyval = key keyval-sep val

key = unquoted-key / quoted-key
unquoted-key = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _
quoted-key = quotation-mark 1*basic-char quotation-mark ; See Basic Strings

val = integer / float / string / boolean / date-time / array / inline-table

;; Table

table = std-table / array-table

;; Standard Table

std-table-open = %x5B ws ; [ Left square bracket
std-table-close = ws %x5D ; ] Right square bracket
table-key-sep = ws %x2E ws ; . Period

std-table = std-table-open key *( table-key-sep key) std-table-close
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule will match [], which is expressly forbidden in the spec.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using <key> in this way does not follow the spec, which makes no mention of quoted table names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flaviut I don't see how it will match []. The <key> rule mandates that at least 1 character be present. Also, see #283 which clarifies key names to match the rules present here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad then, sorry. It seems like I read over the first <key> without noticing it.


;; Array Table

array-table-open = %x5B.5B ws ; [[ Double left square bracket
array-table-close = ws %x5D.5D ; ]] Double right quare bracket

array-table = array-table-open key *( table-key-sep key) array-table-close

;; Integer

integer = [ minus / plus ] int
minus = %x2D ; -
plus = %x2B ; +
digit1-9 = %x31-39 ; 1-9
underscore = %x5F ; _
int = DIGIT / digit1-9 1*( DIGIT / underscore DIGIT )

;; Float

float = integer ( frac / frac exp / exp )
zero-prefixable-int = DIGIT *( DIGIT / underscore DIGIT )
frac = decimal-point zero-prefixable-int
decimal-point = %x2E ; .
exp = e integer
e = %x65 / %x45 ; e E

;; String

string = basic-string / ml-basic-string / literal-string / ml-literal-string

;; Basic String

basic-string = quotation-mark *basic-char quotation-mark

quotation-mark = %x22 ; "

basic-char = basic-unescaped / escaped
escaped = escape ( %x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why \/?

EDIT: I see that v0.4.0 had removed this rule.

%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG / ; uXXXX U+XXXX
%x55 8HEXDIG ) ; UXXXXXXXX U+XXXXXXXX

basic-unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

escape = %x5C ; \

;; Multiline Basic String

ml-basic-string-delim = quotation-mark quotation-mark quotation-mark
ml-basic-string = ml-basic-string-delim ml-basic-body ml-basic-string-delim
ml-basic-body = *( ml-basic-char / newline / ( escape newline ))

ml-basic-char = ml-basic-unescaped / escaped
ml-basic-unescaped = %x20-5B / %x5D-10FFFF

;; Literal String

literal-string = apostraphe *literal-char apostraphe
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be spelled "apostrophe".


apostraphe = %x27 ; ' Apostraphe

literal-char = %x09 / %x20-26 / %x28-10FFFF

;; Multiline Literal String

ml-literal-string-delim = apostraphe apostraphe apostraphe
ml-literal-string = ml-literal-string-delim ml-literal-body ml-literal-string-delim

ml-literal-body = *( ml-literal-char / newline )
ml-literal-char = %x09 / %x20-10FFFF

;; Boolean

boolean = true / false
true = %x74.72.75.65 ; true
false = %x66.61.6C.73.65 ; false

;; Datetime (as defined in RFC 3339)

date-fullyear = 4DIGIT
date-month = 2DIGIT ; 01-12
date-mday = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on month/year
time-hour = 2DIGIT ; 00-23
time-minute = 2DIGIT ; 00-59
time-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second rules
time-secfrac = "." 1*DIGIT
time-numoffset = ( "+" / "-" ) time-hour ":" time-minute
time-offset = "Z" / time-numoffset
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be clarified that Z can be uppercase or lowercase:

NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
syntax may alternatively be lower case "t" or "z" respectively.

Same with <exp> and <date-time>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.


partial-time = time-hour ":" time-minute ":" time-second [time-secfrac]
full-date = date-fullyear "-" date-month "-" date-mday
full-time = partial-time time-offset

date-time = full-date "T" full-time
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: ISO 8601 defines date and time separated by "T".
Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.

Since non-technical users are part of the target audience, the improved readability might be appreciated. Should I make a separate issue for this (the specification is somewhat ambiguous here)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please make a new issue.


;; Array

array-open = %x5B ws ; [
array-close = ws %x5D ; ]

array = array-open array-values array-close

array-values = [ val [ array-sep ] [ ( comment newlines) / newlines ] /
val array-sep [ ( comment newlines) / newlines ] array-values ]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This disallows newline before the first element, e.g.

# this is not allowed
key = [
    1]

but allows it after:

# this is ok
key = [1
    ]

Is that intended?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully not. Could this be fixed? (It's surprising behavior)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also disallows newlines between values and commas. Is it ok?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mojombo Ping...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any news on this?


array-sep = ws %x2C ws ; , Comma

;; Inline Table

inline-table-open = %x7B ws ; {
inline-table-close = ws %x7D ; }
inline-table-sep = ws %x2C ws ; , Comma

inline-table = inline-table-open inline-table-keyvals inline-table-close

inline-table-keyvals = [ inline-table-keyvals-non-empty ]
inline-table-keyvals-non-empty = key keyval-sep val /
key keyval-sep val inline-table-sep inline-table-keyvals-non-empty

;; Built-in ABNF terms, reproduced here for clarity

; ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use unicode ALPHA? Allow characters like 'Æ', 'ø', 'å', 'ß' to be in keywords. Allow f.ex. Arabic or Hebrew to be used in key words.

; DIGIT = %x30-39 ; 0-9
; HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"