Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge KDL v2 #286

Merged
merged 106 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from 87 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
910f6e9
Do not escape / (Solidus, Forwardslash) (#197)
danini-the-panini Aug 28, 2022
69ac280
KQL: require operator and change operator grammar a bit (#221)
zkat Aug 28, 2022
2d5e543
KQL: remove map operator and accessors (#222)
zkat Aug 28, 2022
1bf4d74
Allow "empty" single line comments in the spec (#234)
basile-henry Aug 28, 2022
78a2d5f
Draft changelog
zkat Aug 28, 2022
f38edc7
add failing test for removed solidus escape
zkat Aug 28, 2022
ffeea8e
Use forward slash in solidus-escape test (#288)
bgotink Aug 30, 2022
337bd1b
Update expected output of test with changed input (#289)
bgotink Aug 30, 2022
825ff2c
Add escaped whitespace to KDL strings (#290)
Lucretiel Sep 1, 2022
0a4a14d
Add escaped whitespace note to v2 changelog (#291)
hkolbeck Sep 1, 2022
d437cf2
Add test for empty single-line comment (#292)
bgotink Sep 2, 2022
06d1d67
Add draft grammar for KQL 1.0.0 (#303)
larsgw Oct 9, 2022
3b39e29
Add vertical tab to whitespace. Closes #331
tabatkins Oct 6, 2023
568c096
Document the vertical tab addition.
tabatkins Oct 6, 2023
0836df1
Restrict idents from looking like raw strings. Closes #200, closes #2…
tabatkins Oct 6, 2023
eb55930
Update formal grammar for KDL 2.0 (#285)
CAD97 Dec 11, 2023
99abeef
fix some confusion in grammar syntax, and actually specify the syntax…
zkat Dec 13, 2023
e6356d5
allow ,<> as identifier characters since they no longer need to be re…
zkat Dec 13, 2023
85aa3a0
treat bare identifiers and strings in value locations (#358)
zkat Dec 13, 2023
2694146
# is just plain illegal now
zkat Dec 13, 2023
5e89c45
Update all examples to use most changes
zkat Dec 13, 2023
fada1fc
Update KQL text, too
zkat Dec 13, 2023
63feef7
Update schema spec
zkat Dec 13, 2023
31fd7bd
Update JiK and XiK too
zkat Dec 13, 2023
b42b6c8
Clarify that multiline comments are allowed after line continuations,…
zkat Dec 13, 2023
5a7b339
Constrain code points to unicode scalar values
zkat Dec 13, 2023
c8488db
Make last semicolon optional for inline nodes
zkat Dec 13, 2023
13799de
Allow whitespace in more places
zkat Dec 13, 2023
49402cc
allow BOM only in the first unicode scalar in a document
zkat Dec 13, 2023
fc1b594
add support for dedented multi-line strings and raw strings
zkat Dec 13, 2023
7790505
Merge branch 'main' into kdl-v2
zkat Dec 13, 2023
8de7df6
formatting
zkat Dec 13, 2023
a0d5030
Release 2.0 draft 1
zkat Dec 13, 2023
54df7f0
Update README
zkat Dec 13, 2023
817a7dc
fixes from review
zkat Dec 15, 2023
9f06153
Add explicit attribution for logo
zkat Dec 15, 2023
56f399b
Add \s to the list of escapes
zkat Dec 15, 2023
b51859e
update tests
zkat Dec 16, 2023
50d378f
update readme a bit
zkat Dec 16, 2023
90cd0b1
make unicodey equals signs valid property assignment characters
zkat Dec 17, 2023
0022536
small rewording
zkat Dec 17, 2023
39b9fac
fix stray quote
zkat Dec 17, 2023
055de4e
better organization of how we talk about identifiers/strings and comm…
zkat Dec 17, 2023
511ab6b
missed a spot
zkat Dec 17, 2023
d433332
Add LRM/RLM to the direction control char list
zkat Dec 17, 2023
d53d99f
test fixes
zkat Dec 17, 2023
057e8c8
Rewrite intro paragraph for strings to make their usage clearer.
tabatkins Dec 26, 2023
419995f
typos
tabatkins Dec 26, 2023
6d359d2
Remove now-irrelevant comment about idents acting like strings (they …
tabatkins Dec 26, 2023
b635470
be more specific
tabatkins Dec 26, 2023
491cc46
Fix the disallowed low ASCIIs
tabatkins Dec 26, 2023
6d091fd
Use consistent codepoint spelling
tabatkins Dec 26, 2023
f02ba59
Make multi-line ws prefix determined by the last line.
tabatkins Dec 26, 2023
935d054
Fix more multiline tests
tabatkins Dec 26, 2023
1294f97
Fix tests about # in an ident string
tabatkins Dec 26, 2023
094a615
Tests are invalid (contained U+FFFD, not surrogates) and are in gener…
tabatkins Dec 26, 2023
c273d24
Dang it, forgot to save README when fixing multiline earlier.
tabatkins Dec 26, 2023
de37e11
Comments are now allowed in and around types (along with other types …
tabatkins Dec 26, 2023
24cd214
Disallow idents like '.1' to avoid footguns
tabatkins Jan 4, 2024
bc2b995
Rename/rearrange the string productions to match the spec text better.
tabatkins Jan 4, 2024
1f28fb0
[editorial] Move keyword production to a better spot. Rephrase bool/k…
tabatkins Jan 4, 2024
1d6809e
Whoops, missed allowing '+.'
tabatkins Jan 4, 2024
af91cc6
Add tests for .1 and general 'ident ambiguous with a number' cases.
tabatkins Jan 4, 2024
2949500
KDL V2 Test Fixes (#368)
IceDragon200 Jan 6, 2024
c15b5c2
make note of .1/+.1 illegality in the changelog
zkat Feb 6, 2024
172c67b
Release 2.0.0 draft 2
zkat Feb 6, 2024
522ce85
clarify multi-line strings further
zkat Feb 7, 2024
35ac19b
fix stray legacy bool in example
zkat Feb 7, 2024
2d4bcd0
Release 2.0.0 draft 3
zkat Feb 7, 2024
f767472
small readme improvements
zkat Feb 7, 2024
40d8c83
unicode character support clarifications
zkat Feb 8, 2024
b1163e1
more small fixes
zkat Feb 8, 2024
f81fcfa
minor reword
zkat Feb 8, 2024
f0f9589
example tweaks
zkat Feb 8, 2024
793a9d4
normalize literal newlines in multiline strings
zkat Feb 8, 2024
abae1f9
more fixes
zkat Feb 9, 2024
7ab8658
iterate a bit on KQL
zkat Feb 12, 2024
ec7880d
Fix broken formatting in grammar language example (#375)
wackbyte Feb 12, 2024
9212117
Remove extra indent in CI example (#376)
wackbyte Feb 12, 2024
631ec14
allow /- at the very beginning of a document
zkat Feb 13, 2024
fa816ca
add floats
zkat Feb 13, 2024
e773747
Release 2.0 draft 4
zkat Feb 13, 2024
2710c90
facepalm: forgot the full grammar change for float keywords
zkat Feb 13, 2024
2fcf6d4
Update tests/test_cases/expected_kdl/multiline_string_indented.kdl
zkat Feb 15, 2024
dadcfdf
Update tests/test_cases/expected_kdl/multiline_raw_string_indented.kdl
zkat Feb 15, 2024
9132a96
Quote identifiers that contain an equals sign (#381)
bgotink Feb 18, 2024
9e7b958
Ensure spec allows slashdash right after node separator (#382)
bgotink Feb 18, 2024
b294e9c
Update README.md
zkat Mar 5, 2024
2de2ddc
Update README.md
zkat Mar 5, 2024
aeb41cc
Update examples/ci.kdl
zkat Mar 5, 2024
d0b30c3
Update SPEC.md
zkat Mar 5, 2024
281de7e
review fixes
zkat Apr 1, 2024
d064bc9
clarify multi-line strings and escapes interaction
zkat Apr 1, 2024
fa9d303
remove duplication of keyword-number
zkat Feb 15, 2024
bea0f67
turn it around: escapes should be resolved _before_ dedenting
zkat Apr 1, 2024
c9134e3
change escape resolution order again
zkat Apr 2, 2024
fa204ce
unicode was not defined in grammar
zkat Apr 3, 2024
6a77436
kql: only allow top() at start of selector (#388)
alightgoesout Apr 17, 2024
bcfb332
Tweak rules for escaped whitespace in multi-line strings (#392)
tjol Jun 13, 2024
1e924bc
clarifications around multiline prefixes
zkat Oct 4, 2024
93c4400
clarify that numbers don't need to be IEEE 754 floats
zkat Nov 27, 2024
fa3050c
add 128-bit ints
zkat Nov 28, 2024
1588b1f
get rid of syntactically significant unicode equals signs (#400)
zkat Nov 29, 2024
90e22bc
[v2] more predictable slashdash (#407)
zkat Nov 29, 2024
76a1de5
Release 2.0.0 draft 5
zkat Nov 29, 2024
8aa4c15
prep readme for merging to main
zkat Nov 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# KDL Changelog

## 2.0.0 (2024-02-07)

### Grammar

* Solidus/Forward slash (`/`) is no longer an escaped character.
* Space (`U+0020`) can now be written into quoted strings with the `\s`
escape.
* Single line comments (`//`) can now be immediately followed by a newline.
* All literal whitespace following a `\` in a string is now discarded.
* Vertical tabs (`U+000B`) are now considered to be whitespace.
* The grammar syntax itself has been described, and some confusing definitions
in the grammar have been fixed accordingly (mostly related to escaped
characters).
* `,`, `<`, and `>` are now legal identifier characters. They were previously
reserved for KQL but this is no longer necessary.
* Code points under `0x20` (except newline and whitespace code points), code
points above `0x10FFFF`, Delete control character (`0x7F`), and the [unicode
"direction control"
characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls)
are now completely banned from appearing literally in KDL documents. They
can now only be represented in regular strings, and there's no facilities to
represent them in raw strings. This should be considered a security
improvement.
* Raw strings no longer require an `r` prefix: they are now specified by using
`#""#`.
* Line continuations can be followed by an EOF now, instead of requiring a
newline (or comment). `node \<EOF>` is now a legal KDL document.
* `#` is no longer a legal identifier character.
* `null`, `true`, and `false` are now `#null`, `#true`, and `#false`. Using
the unprefixed versions of these values is a syntax error.
Comment on lines +43 to +44
Copy link

@chitoyuu chitoyuu Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry if this has been explained somewhere before, a 100+ comment discussion can be hard to follow.

Speaking purely from a user's perspective, this specific change feels a bit unnecessary. I assume it's being done to prevent ambiguity, but null, true and false are keywords common enough, that I don't think anyone with the slightest experience writing code would be surprised if they have special meanings unlike normal identifiers. If anything, being a Rust user, I would come into this expecting the exact opposite: that true means the boolean and #true means the raw identifier, similar to how it works in Rust sans the r.

Is there something I'm missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided this would be enough of a potential footgun since the change to allow unquoted strings. Programming languages like Rust have other defenses against this kind of confusion, but kdl would need to do something different (like this prefixing) to prevent, say, the kind of things you see happen in plain JavaScript

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, and fair enough, if that's the point of balance you've decided on for your project.

* The spec prose has more explicitly stated that whitespace and newlines are
not valid identifier characters, even though the grammar already expressed
this.
* Bare identifiers can now be used as values in Arguments and Properties, and are interpreted as string values.
* The spec prose now more explicitly states that strings and raw strings can
be used as type annotations.
* A statement in the spec prose that said "It is reasonable for an
implementation to ignore null values altogether when deserializing". This is
no longer encouraged or desired.
* Code points have been constrained to [Unicode Scalar
Values](https://unicode.org/glossary/#unicode_scalar_value) only, including
values used in string escapes (`\u{}`). All KDL documents and string values
should be valid UTF-8 now, as was intended.
* The last node in a child block no longer needs to be terminated with `;`,
even if the closing `}` is on the same line, so this is now a legal node:
`node {foo;bar;baz}`
* More places allow whitespace (node-spaces, specifically) now. With great
power comes great responsibility:
* Inside `(foo)` annotations (so, `( foo )` would be legal (`( f oo )` would
not be, since it has two identifiers))
* Between annotations and the thing they're annotating (`(blah) node (thing)
1 y= (who) 2`)
* Around `=` for props (`x = 1`)
* The BOM is now only allowed as the first character in a document. It was
previously treated as generic whitespace.
* Multi-line strings are now automatically dedented, according to the common
whitespace matching the whitespace prefix of the closing line. Multiline
strings and raw strings now must have a newline immediately following their
opening `"`, and a final newline plus whitespace preceding the closing `"`.
* SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY
EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for
properties (e.g. `お名前=☜(゚ヮ゚☜)`). They are also no longer valid in bare
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example doesn't look right: surely ( and ) aren't allowed here?

identifiers.
* `.1`, `+.1` etc are no longer valid identifiers, to prevent confusion and
conflicts with numbers.
* Multi-line strings' literal Newline sequences are now normalized to single
`LF`s.
* `#inf`, `#-inf`, and `#nan` have been added in order to properly support
IEEE floats for implementations that choose to represent their decimals that
way.
* Correspondingly, the identifiers `inf`, `-inf`, and `nan` are now syntax
errors.

### KQL

* There's now a _required_ descendant selector (`>>`), instead of using plain
spaces for that purpose.
* The "any sibling" selector is now `++` instead of `~`, for consistency with
the new descendant selector.
* Some parsing logic around the grammar has changed.
* Multi- and single-line comments are now supported, as well as line
continuations with `\`.
* Map operators have been removed entirely.
13 changes: 7 additions & 6 deletions JSON-IN-KDL.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the JiK spec mention that #inf, #-inf, and #nan are invalid values in a JiK context?
It is already implied, as JSON can't represent these values, but I think it would make sense to make this explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for a warning, yeah.

Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@ JSON-in-KDL (JiK)

This specification describes a canonical way to losslessly encode [JSON](https://json.org) in [KDL](https://kdl.dev). While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with a JSON-consuming or -emitting service.

This is version 3.0.1 of JiK.
This is version 4.0.0 of JiK.

JSON-in-KDL (JiK from now on) is a kdl microsyntax consisting of named nodes that represent objects, arrays, or literal values.

----

JSON literals are, luckily, a subset of KDL's literals. There are two ways to write a JSON literal into JiK:
There are two ways to write a JSON literal into JiK:

* As a node with any nodename and a single argument, like `- true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
* As a node with any nodename and a single argument, like `- #true` (for the JSON `true`) or `foo 5` (for the JSON `5`).
* When nested in arrays or objects, literals can usually be written as arguments (for array nodes) or properties (for object nodes). See below for details.

----
Expand All @@ -25,7 +25,7 @@ Children can encode literals and/or nested arrays and objects. For example, the
```kdl
- {
- 1
- true false
- #true #false
- 3
}
```
Expand All @@ -36,7 +36,7 @@ Arguments and children can be mixed, if desired. The preceding example could als

```kdl
- 1 {
- true false
- #true #false
- 3
}
```
Expand All @@ -54,10 +54,11 @@ The `(array)` type annotation can be used on any other valid array node if desir

JSON objects are represented in JiK as a node with any nodename, with zero or more properties and/or zero or more children with any nodenames.

Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=true`.
Properties can encode literals - for example, the JSON `{"foo": 1, "bar": true}` can be written in JiK as `- foo=1 bar=#true`.

Children can encode literals and/or nested arrays and objects,
using the nodename for the item's key.

For example, the JSON `{"foo": 1, "bar": [2, {"baz": 3}], "qux":4}` can be written in JiK as:

```kdl
Expand Down
89 changes: 36 additions & 53 deletions QUERY-SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,20 @@ documents to extract nodes and even specific data. It is loosely based on CSS
selectors for familiarity and ease of use. Think of it as CSS Selectors or
XPath, but for KDL!

This document describes KQL `1.0.0`. It was released on September 11, 2021.
This document describes KQL `next`. It is unreleased.

## Selectors

Selectors use selection operators to filter nodes that will be returned by an
API using KQL. The main differences between this and CSS selectors are the
lack of `*` (use `[]` instead), and the specific syntax for
lack of `*` (use `[]` instead), the specific syntax for descendants and siblings, and the specific syntax for
[matchers](#matchers) (the stuff between `[` and `]`), which is similar, but not identical to CSS.

* `a > b`: Selects any `b` element that is a direct child of an `a` element.
* `a b`: Selects any `b` element that is a _descendant_ of an `a` element.
* `a b || a c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
* `a >> b`: Selects any `b` element that is a _descendant_ of an `a` element.
* `a >> b || a >> c`: Selects all `b` and `c` elements that are descendants of an `a` element. Any selector may be on either side of the `||`. Multiple `||` are supported.
* `a + b`: Selects any `b` element that is placed immediately after a sibling `a` element.
* `a ~ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
* `a ++ b`: Selects any `b` element that follows an `a` element as a sibling, either immediately or later.
* `[accessor()]`: Selects any element, filtered by [an accessor](#accessors). (`accessor()` is a placeholder, not an actual accessor)
* `a[accessor()]`: Selects any `a` element, filtered by an accessor.
* `[]`: Selects any element.
Expand All @@ -44,8 +44,8 @@ Attribute matchers support certain binary operators:
* `[val() = 1]`: Selects any element whose first value is 1.
* `[prop(name) = 1]`: Selects any element with a property `name` whose value is 1.
* `[name = 1]`: Equivalent to the above.
* `[name() = "hi"]`: Selects any element whose _node name_ is `"hi"`. Equivalent to just `hi`, but more useful when using string operators.
* `[tag() = "hi"]`: Selects any element whose type annotation is `"hi"`. Equivalent to just `(hi)`, but more useful when using string operators.
* `[name() = hi]`: Selects any element whose _node name_ is "hi". Equivalent to just `hi`, but more useful when using string operators.
* `[tag() = hi]`: Selects any element whose tag is "hi". Equivalent to just `(hi)`, but more useful when using string operators.
* `[val() != 1]`: Selects any element whose first value exists, and is not 1.

The following operators work with any `val()` or `prop()` values.
Expand All @@ -60,64 +60,37 @@ never coerced to 1, and there is no "universal" ordering across all types.):
The following operators work only with string `val()`, `prop()`, `tag()`, or `name()` values.
If the value is not a string, the matcher will always fail:

* `[val() ^= "foo"]`: Selects any element whose first value starts with "foo".
* `[val() $= "foo"]`: Selects any element whose first value ends with "foo".
* `[val() *= "foo"]`: Selects any element whose first value contains "foo".
* `[val() ^= foo]`: Selects any element whose first value starts with "foo".
* `[val() $= foo]`: Selects any element whose first value ends with "foo".
* `[val() *= foo]`: Selects any element whose first value contains "foo".

The following operators work only with `val()` or `prop()` values. If the value
is not one of those, the matcher will always fail:

* `[val() = (foo)]`: Selects any element whose type annotation is `foo`.

## Map Operator

KQL implementations MAY support a "map operator", `=>`, that allows selection
of specific parts of the selected notes, essentially "mapping" over a
selector's result set.

Only a single map operator may be used, and it must be the last element in a
selector string.

The map operator's right hand side is either an [`accessor`](#accessors) on
its own, or a tuple of accessors, denoted by a comma-separated list wrapped in
`()` (for example, `(a, b, c)`).

## Accessors

Accessors access/extract specific parts of a node. They are used with the [map
operator](#map-operator), and have syntactic overlap with some
[matchers](#matchers).

* `name()`: Returns the name of the node itself.
* `val(2)`: Returns the third value in a node.
* `val()`: Equivalent to `val(0)`.
* `prop(foo)`: Returns the value of the property `foo` in the node.
* `foo`: Equivalent to `prop(foo)`.
* `props()`: Returns all properties of the node as an object.
* `values()`: Returns all values of the node as an array.

## Examples

Given this document:

```kdl
package {
name "foo"
name foo
version "1.0.0"
dependencies platform="windows" {
dependencies platform=windows {
winapi "1.0.0" path="./crates/my-winapi-fork"
}
dependencies {
miette "2.0.0" dev=true
miette "2.0.0" dev=#true integrity=(sri)sha512-deadbeef
}
}
```

Then the following queries are valid:

* `package name`
* `package >> name`
* -> fetches the `name` node itself
* `top() > package name`
* `top() > package >> name`
* -> fetches the `name` node, guaranteeing that `package` is in the document root.
* `dependencies`
* -> deep-fetches both `dependencies` nodes
Expand All @@ -129,14 +102,24 @@ Then the following queries are valid:
* -> fetches all direct-child nodes of any `dependencies` nodes in the
document. In this case, it will fetch both `miette` and `winapi` nodes.

If using an API that supports the [map operator](#map-operator), the following
are valid queries:

* `package name => val()`
* -> `["foo"]`.
* `dependencies[platform] => platform`
* -> `["windows"]`
* `dependencies > [] => (name(), val(), path)`
* -> `[("winapi", "1.0.0", "./crates/my-winapi-fork"), ("miette", "2.0.0", None)]`
* `dependencies > [] => (name(), values(), props())`
* -> `[("winapi", ["1.0.0"], {"platform": "windows"}), ("miette", ["2.0.0"], {"dev": true})]`
## Full Grammar

Rules that are not defined in this grammar are prefixed with `$`, see [the KDL
grammar](https://github.com/kdl-org/kdl/blob/main/SPEC.md#full-grammar) for
what they expand to.

```
query-str := $bom? query
query := selector q-ws* "||" q-ws* query | selector
selector := filter q-ws* selector-operator q-ws* selector | filter
selector-operator := ">>" | ">" | "++" | "+"
filter := "top(" q-ws* ")" | matchers
matchers := type-matcher $string? accessor-matcher* | $string accessor-matcher* | accessor-matcher+
type-matcher := "(" q-ws* ")" | $type
accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]"
comparison := accessor q-ws* matcher-operator q-ws* ($type | $string | $number | $keyword)
accessor := "val(" q-ws* $integer q-ws* ")" | "prop(" q-ws* $string q-ws* ")" | "name(" q-ws* ")" | "tag(" q-ws* ")" | "values(" q-ws* ")" | "props(" q-ws* ")" | $string
matcher-operator := "=" | "!=" | ">" | "<" | ">=" | "<=" | "^=" | "$=" | "*="

q-ws := $plain-node-space
```
Loading