Precise location reporting for sub-parsers #286

dmarcotte · 2026-01-05T21:12:37Z

Enable precise error/message reporting by mapping positions in processed string content back to original source locations.

Motivation

When a sub-parser processes embedded content (like SQL in an embed block or escaped sequences in quoted strings), errors want to point to the exact location in the original KSON document. For example, if there's a syntax error in character 15 of processed SQL, we need to trace that back through indent trimming and escape processing to find where character 15 maps to in the raw source.

Implementation approach

We maintain mappings/source maps from processed KSON content to the original KSON source so we are always able to translate a location reported on processed content (i.e. an embed block with its minimum indent stripped and escapes evaluated) back to a location in the original KSON document.

The core challenge this change needed to solve was that tokens were being transformed into values too early in the parsing pipeline, losing the connection to their source text. This involved several steps:

Token handling refactor: AST nodes now hold references to their sourceTokens rather than pre-computed values. Each node owns the transformation from raw tokens to processed values, which naturally positions them to track the mapping between the two.
Content transformers: Introduced transformer classes (QuotedStringContentTransformer, EmbedContentTransformer) that handle the conversion from raw to processed content while maintaining a source map. These map each character position in the processed output back to its position in the raw input.
Sub-location API: Extended KsonValue and KsonString with sublocation() methods that compose these mappings, allowing sub-parsers to report errors at precise locations in the original document.

Benefits

Sub-parsers can now provide error messages that point to exact positions in source KSON
Cleaner code organization with transformations owned by the AST nodes they belong to
Foundation for future tooling improvements (LSP, better diagnostics, etc.)

TODO

Refactor the errors reported by NumberParser to report their precise location rather than highlighting the whole number (Precise locations for number errors #287)
Consider Refactoring the errors reported on bad string escapes to use this (they are currently Lexed carefully to give us a precise token to highlight) (Consider refactoring string escape errors #288)

This refactor removes the too-early (and sometime clumsy) evaluation of what the "value" for a token's raw text must be. We now instead pass the `sourceTokens` into the AstNode class to whom the parse said they belong and have those classes own the transformation into a "value" This refactor not only streamlines code and organizes transformations into our Ast class where it definitely seems they belong, it also positions us to hopefully implement "source maps" for KSON values which will allow tracing an error on a portion of a KsonValue's value all the way back to its location in the raw source text (accounting for things like processed escapes and trimmed embed indents) One final benefit: this refactor highlights that our embedTag and embedMeta are improperly hijacking the QuotedString class. I have a planned improvement that will address, making embedTag and embedMeta fully compatible with JSON Strings (and hence accommodating fully idempotent round-trips to the JSON object notation version of an embed) which will fix this. In the meantime, it's cool to have this refactor extra-validated by how it helps make obvious subtle problems in the code.

Build the core functionality for mapping from transformed sub-locations in Embed Block processed values back to their raw position in the source KSON document, organizing it all into the new EmbedContentTransformer class

Extend the source map design implemented in EmbedContentTransformer to our quoted strings, being careful to refactor the escaping logic that used to be owned by Escaping.kt into this new home. From here we should be able to extend out to implementing general `subLocation` support across all KsonValues.

KsonValue is the public/friendly interface on a valid KSON AST tree. The AST API if you will. Having these object hold a pointer to their source AstNode ensures they have the context they need to provide a rich interface. The short term motivation for this refactor is the upcoming "subLocation" feature we've been laying the groundwork for in e13178b and 5ceaa3f

The KsonString and KsonValue classes now support calculating sub-locations to support sub-parsers contributing errors/warnings/info back up to the original source KSON document at precise locations.

Conflicts: kson-lib/src/commonMain/kotlin/org/kson/Kson.kt src/commonMain/kotlin/org/kson/ast/Ast.kt

The merge in f2219a1 changed how text is looked up on Tokens and how strings are represented in the AST (quotes are not longer part of `StringNode`s)

dmarcotte added 8 commits November 26, 2025 18:12

Sub-location mapping for embed blocks

e13178b

Build the core functionality for mapping from transformed sub-locations in Embed Block processed values back to their raw position in the source KSON document, organizing it all into the new EmbedContentTransformer class

Add subLocation support to KsonValue

f7ee48f

The KsonString and KsonValue classes now support calculating sub-locations to support sub-parsers contributing errors/warnings/info back up to the original source KSON document at precise locations.

Merge branch 'main' into subparser-support

f2219a1

Conflicts: kson-lib/src/commonMain/kotlin/org/kson/Kson.kt src/commonMain/kotlin/org/kson/ast/Ast.kt

Fix some merge issues

4e97158

The merge in f2219a1 changed how text is looked up on Tokens and how strings are represented in the AST (quotes are not longer part of `StringNode`s)

Merge branch 'main' into subparser-support

9460a2e

This was referenced Jan 5, 2026

Precise locations for number errors #287

Open

Consider refactoring string escape errors #288

Closed

Merge branch 'main' into subparser-support

329ca98

dmarcotte merged commit 357dcc1 into kson-org:main Jan 6, 2026
1 check passed

dmarcotte deleted the subparser-support branch January 6, 2026 00:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precise location reporting for sub-parsers #286

Precise location reporting for sub-parsers #286

Uh oh!

dmarcotte commented Jan 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Precise location reporting for sub-parsers #286

Precise location reporting for sub-parsers #286

Uh oh!

Conversation

dmarcotte commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Implementation approach

Benefits

TODO

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dmarcotte commented Jan 5, 2026 •

edited

Loading