Skip to content

Commit

Permalink
get rid of syntactically significant unicode equals signs (#400)
Browse files Browse the repository at this point in the history
Fixes: #399
  • Loading branch information
zkat authored Nov 29, 2024
1 parent fa3050c commit 1588b1f
Show file tree
Hide file tree
Showing 7 changed files with 15 additions and 37 deletions.
4 changes: 0 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,6 @@
whitespace matching the whitespace prefix of the closing line. Multiline
strings and raw strings now must have a newline immediately following their
opening `"`, and a final newline plus whitespace preceding the closing `"`.
* SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY
EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for
properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare
identifiers.
* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and
conflicts with numbers.
* Multi-line strings' literal Newline sequences are now normalized to single
Expand Down
17 changes: 8 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,11 +158,10 @@ node3 #"C:\Users\zkat\raw\string"#

You don't have to quote strings unless any the following apply:
* The string contains whitespace.
* The string contains any of `[]{}()\/#";`.
* The string is one of `true`, `false`, or `null`.
* The string contains any of `[]{}()\/#";=`.
* The string is one of `true`, `false`, `null`, `inf`, `-inf`, or `nan`.
* The strings starts with a digit, or `+`/`-`/`.`/`-.`,`+.` and a digit.
* The string contains an equals sign (including unicode equals signs ``,
``, and `🟰`).
(aka "looks like a number")

In essence, if it can get confused for other KDL or KQL syntax, it needs
quotes.
Expand Down Expand Up @@ -296,8 +295,8 @@ smile 😁
// Identifiers are very flexible. The following is a legal bare identifier:
<@foo123~!$%^&*.:'|?+>
// And you can also use unicode, even for the equals sign!
ノード お名前=☜(゚ヮ゚☜)
// And you can also use unicode!
ノード お名前=ฅ^•ﻌ•^ฅ
// kdl specifically allows properties and values to be
// interspersed with each other, much like CLI commands.
Expand Down Expand Up @@ -335,9 +334,9 @@ SDLang, but that had some design choices I disagreed with.

#### Ok, then, why not SDLang?

SDLang is designed for use cases that are not interesting to me, but are very
relevant to the D-lang community. KDL is very similar in many ways, but is
different in the following ways:
SDLang is an excellent base, but I wanted some details ironed out, and some
things removed that only really made sense for SDLang's current use-cases, including
some restrictions about data representation. KDL is very similar in many ways, except:

* The grammar and expected semantics are [well-defined and specified](SPEC.md).
* There is only one "number" type. KDL does not prescribe representations.
Expand Down
24 changes: 5 additions & 19 deletions SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,8 @@ my-node 1 2 \ // comments are ok after \
### Property

A Property is a key/value pair attached to a [Node](#node). A Property is
composed of a [String](#string), followed immediately by an [equals
sign](#equals-sign), and then a [Value](#value).
composed of a [String](#string), followed immediately by an equals sign (`=`, `U+003D`),
and then a [Value](#value).

Properties should be interpreted left-to-right, with rightmost properties with
identical names overriding earlier properties. That is:
Expand All @@ -131,17 +131,6 @@ still be spec-compliant.
Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and
make it act as plain whitespace, even if it spreads across multiple lines.

#### Equals Sign

Any of the following characters may be used as equals signs in properties:

| Name | Character | Code Point |
|----|-----|----|
| EQUALS SIGN | `=` | `U+003D` |
| SMALL EQUALS SIGN | `` | `U+FE66` |
| FULLWIDTH EQUALS SIGN | `` | `U+FF1D` |
| HEAVY EQUALS SIGN | `🟰` | `U+1F7F0` |

### Argument

An Argument is a bare [Value](#value) attached to a [Node](#node), with no
Expand Down Expand Up @@ -334,8 +323,7 @@ negative number.

The following characters cannot be used anywhere in a [Identifier String](#identifier-string):

* Any of `(){}[]/\"#;`
* Any [Equals Sign](#equals-sign)
* Any of `(){}[]/\"#;=`
* Any [Whitespace](#whitespace) or [Newline](#newline).
* Any [disallowed literal code points](#disallowed-literal-code-points) in KDL
documents.
Expand Down Expand Up @@ -780,19 +768,17 @@ node-prop-or-arg := prop | value
node-children := '{' nodes final-node? '}'
node-terminator := single-line-comment | newline | ';' | eof
prop := string optional-node-space equals-sign optional-node-space value
prop := string optional-node-space '=' optional-node-space value
value := type? optional-node-space (string | number | keyword)
type := '(' optional-node-space string optional-node-space ')'
equals-sign := See Table ([Equals Sign](#equals-sign))
string := identifier-string | quoted-string | raw-string
identifier-string := unambiguous-ident | signed-ident | dotted-ident
unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - 'true' - 'false' - 'null' - 'inf' - '-inf' - 'nan'
signed-ident := sign ((identifier-char - digit - '.') identifier-char*)?
dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)?
identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#] - disallowed-literal-code-points - equals-sign
identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points
quoted-string := '"' (single-line-string-body | newline multi-line-string-body newline unicode-space*) '"'
single-line-string-body := (string-character - newline)*
Expand Down
1 change: 0 additions & 1 deletion tests/test_cases/expected_kdl/unicode_equals_signs.kdl

This file was deleted.

1 change: 1 addition & 0 deletions tests/test_cases/expected_kdl/unicode_silly.kdl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ノード お名前=ฅ^•ﻌ•^ฅ
4 changes: 0 additions & 4 deletions tests/test_cases/input/unicode_equals_signs.kdl

This file was deleted.

1 change: 1 addition & 0 deletions tests/test_cases/input/unicode_silly.kd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ノード お名前=ฅ^•ﻌ•^ฅ

0 comments on commit 1588b1f

Please sign in to comment.