Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-English scripts for unquoted keys #891

Merged
merged 24 commits into from
Sep 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
cc0a59e
Allow non-English scripts for unquoted keys
abelbraaksma Mar 15, 2022
aacda84
Update changelog and toml.md with non-English bare key examples
abelbraaksma Mar 15, 2022
6c7e62d
Simplify abnf slightly
abelbraaksma Mar 16, 2022
ddf3fa5
Add full Unicode ranges to toml.md
abelbraaksma Mar 16, 2022
40e6c13
remove trailing spaces
abelbraaksma Mar 16, 2022
8e78ee9
improve wording
abelbraaksma Mar 16, 2022
0387d87
Fix formatting in Unicode summary of toml.md
abelbraaksma Mar 24, 2022
d01782f
Specifically include Enclosed Alphanumerics U+2460-24FF
abelbraaksma Mar 24, 2022
44a1706
Include currency, superscript and fractions, as they are allowed else…
abelbraaksma Mar 24, 2022
92f1e13
Remove currency signs, they shouldn't have been there.
abelbraaksma Mar 24, 2022
0216928
Reflow to fit the 80 character line length limit in toml.md
abelbraaksma Mar 27, 2022
2a3c7f5
Include U+FFFE/FFFF in the text to match the ABNF
abelbraaksma Mar 29, 2022
7052d5c
Fix PUP range in toml.md
abelbraaksma Mar 30, 2022
c086c79
Exclude U+B1, typo, shouldn't have been included
abelbraaksma Apr 2, 2022
68c0ad8
Update toml.md
abelbraaksma Apr 22, 2022
2bbe92d
Update toml.md, clearer wording
abelbraaksma Apr 22, 2022
e075327
Fix end-of-range D999 to DFFF for surrogate blocks
abelbraaksma Apr 28, 2022
4193be3
Merge branch 'main' into update-changelog
abelbraaksma Jul 30, 2022
91533ed
Merge branch 'main' into update-changelog
abelbraaksma Jul 31, 2022
2e239be
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 31, 2022
0c4f76e
Update toml.md to contained proposed text for 'bare keys' explanation
abelbraaksma Aug 15, 2022
9c3e8c9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2022
19b3ff8
Fix use of emoji variant of info icon character
abelbraaksma Aug 16, 2022
e44e10c
Fix link to suggested format by @marzer, @pradyunsg
abelbraaksma Aug 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
- Add new `\e` shorthand for the escape character.
- Add \x00 notation to basic strings.
- Seconds in Date-Time and Time values are now optional.
- Allow non-English scripts in unquoted (bare) keys

## 1.0.0 / 2021-01-11

Expand Down
21 changes: 17 additions & 4 deletions toml.abnf
Original file line number Diff line number Diff line change
Expand Up @@ -46,19 +46,32 @@ comment = comment-start-symbol *non-eol
;; Key-Value pairs

keyval = key keyval-sep val

key = simple-key / dotted-key
val = string / boolean / array / inline-table / date-time / float / integer

simple-key = quoted-key / unquoted-key

unquoted-key = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _
;; Unquoted key

unquoted-key = 1*unquoted-key-char
unquoted-key-char = ALPHA / DIGIT / %x2D / %x5F ; a-z A-Z 0-9 - _
unquoted-key-char =/ %xB2 / %xB3 / %xB9 / %xBC-BE ; superscript digits, fractions
unquoted-key-char =/ %xC0-D6 / %xD8-F6 / %xF8-37D ; non-symbol chars in Latin block
unquoted-key-char =/ %x37F-1FFF ; exclude GREEK QUESTION MARK, which is basically a semi-colon
unquoted-key-char =/ %x200C-200D / %x203F-2040 ; from General Punctuation Block, include the two tie symbols and ZWNJ, ZWJ
unquoted-key-char =/ %x2070-218F / %x2460-24FF ; include super-/subscripts, letterlike/numberlike forms, enclosed alphanumerics
unquoted-key-char =/ %x2C00-2FEF / %x3001-D7FF ; skip arrows, math, box drawing etc, skip 2FF0-3000 ideographic up/down markers and spaces
unquoted-key-char =/ %xF900-FDCF / %xFDF0-FFFD ; skip D800-DFFF surrogate block, E000-F8FF Private Use area, FDD0-FDEF intended for process-internal use (unicode)
unquoted-key-char =/ %x10000-EFFFF ; all chars outside BMP range, excluding Private Use planes (F0000-10FFFF)

;; Quoted and dotted key

quoted-key = basic-string / literal-string
dotted-key = simple-key 1*( dot-sep simple-key )

dot-sep = ws %x2E ws ; . Period
keyval-sep = ws %x3D ws ; =

val = string / boolean / array / inline-table / date-time / float / integer

;; String

string = ml-basic-string / basic-string / ml-literal-string / literal-string
Expand Down
28 changes: 21 additions & 7 deletions toml.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,27 +97,37 @@ first = "Tom" last = "Preston-Werner" # INVALID

A key may be either bare, quoted, or dotted.

**Bare keys** may only contain ASCII letters, ASCII digits, underscores, and
dashes (`A-Za-z0-9_-`). Note that bare keys are allowed to be composed of only
ASCII digits, e.g. `1234`, but are always interpreted as strings.
**Bare keys** may contain any letter-like or number-like Unicode character from
any Unicode script, as well as ASCII digits, dashes and underscores.
Punctuation, spaces, arrows, box drawing and private use characters are not
allowed. Note that bare keys are allowed to be composed of only ASCII digits,
e.g. 1234, but are always interpreted as strings.

ℹ️ The exact ranges of allowed code points can be found in the
[ABNF grammar file][abnf].

```toml
key = "value"
bare_key = "value"
bare-key = "value"
1234 = "value"
Fuß = "value"
😂 = "value"
汉语大字典 = "value"
辭源 = "value"
பெண்டிரேம் = "value"
```

**Quoted keys** follow the exact same rules as either basic strings or literal
strings and allow you to use a much broader set of key names. Best practice is
to use bare keys except when absolutely necessary.
strings and allow you to use any Unicode character in a key name, including
spaces. Best practice is to use bare keys except when absolutely necessary.

```toml
"127.0.0.1" = "value"
"character encoding" = "value"
"ʎǝʞ" = "value"
'key2' = "value"
'quoted "value"' = "value"
"╠═╣" = "value"
"⋰∫∬∭⋱" = "value"
```

A bare key must be non-empty, but an empty quoted key is allowed (though
Expand All @@ -138,6 +148,7 @@ name = "Orange"
physical.color = "orange"
physical.shape = "round"
site."google.com" = true
பெண்.டிரேம் = "we are women"
```

In JSON land, that would give you the following structure:
Expand All @@ -151,6 +162,9 @@ In JSON land, that would give you the following structure:
},
"site": {
"google.com": true
},
"பெண்": {
"டிரேம்": "we are women"
}
}
```
Expand Down