Skip to content

Conversation

@dmarcotte
Copy link
Contributor

@dmarcotte dmarcotte commented Jan 5, 2026

Enable precise error/message reporting by mapping positions in processed string content back to original source locations.

Motivation

When a sub-parser processes embedded content (like SQL in an embed block or escaped sequences in quoted strings), errors want to point to the exact location in the original KSON document. For example, if there's a syntax error in character 15 of processed SQL, we need to trace that back through indent trimming and escape processing to find where character 15 maps to in the raw source.

Implementation approach

We maintain mappings/source maps from processed KSON content to the original KSON source so we are always able to translate a location reported on processed content (i.e. an embed block with its minimum indent stripped and escapes evaluated) back to a location in the original KSON document.

The core challenge this change needed to solve was that tokens were being transformed into values too early in the parsing pipeline, losing the connection to their source text. This involved several steps:

  • Token handling refactor: AST nodes now hold references to their sourceTokens rather than pre-computed values. Each node owns the transformation from raw tokens to processed values, which naturally positions them to track the mapping between the two.

  • Content transformers: Introduced transformer classes (QuotedStringContentTransformer, EmbedContentTransformer) that handle the conversion from raw to processed content while maintaining a source map. These map each character position in the processed output back to its position in the raw input.

  • Sub-location API: Extended KsonValue and KsonString with sublocation() methods that compose these mappings, allowing sub-parsers to report errors at precise locations in the original document.

Benefits

  • Sub-parsers can now provide error messages that point to exact positions in source KSON
  • Cleaner code organization with transformations owned by the AST nodes they belong to
  • Foundation for future tooling improvements (LSP, better diagnostics, etc.)

TODO

This refactor removes the too-early (and sometime clumsy) evaluation
of what the "value" for a token's raw text must be.  We now instead
pass the `sourceTokens` into the AstNode class to whom the parse said
they belong and have those classes own the transformation into a
"value"

This refactor not only streamlines code and organizes transformations
into our Ast class where it definitely seems they belong, it also
positions us to hopefully implement "source maps" for KSON values which
will allow tracing an error on a portion of a KsonValue's value all
the way back to its location in the raw source text (accounting for
things like processed escapes and trimmed embed indents)

One final benefit: this refactor highlights that our embedTag and
embedMeta are improperly hijacking the QuotedString class. I have a
planned improvement that will address, making embedTag and embedMeta
fully compatible with JSON Strings (and hence accommodating fully
idempotent round-trips to the JSON object notation version of an embed)
which will fix this. In the meantime, it's cool to have this refactor
extra-validated by how it helps make obvious subtle problems in the
code.
Build the core functionality for mapping from transformed sub-locations
in Embed Block processed values back to their raw position in the
source KSON document, organizing it all into the new
EmbedContentTransformer class
Extend the source map design implemented in EmbedContentTransformer to
our quoted strings, being careful to refactor the escaping logic that
used to be owned by Escaping.kt into this new home.  From here we should
be able to extend out to implementing general `subLocation` support
across all KsonValues.
KsonValue is the public/friendly interface on a valid KSON AST tree.
The AST API if you will. Having these object hold a pointer to their
source AstNode ensures they have the context they need to provide a
rich interface.  The short term motivation for this refactor is
the upcoming "subLocation" feature we've been laying the groundwork for
in e13178b and 5ceaa3f
The KsonString and KsonValue classes now support calculating
sub-locations to support sub-parsers contributing errors/warnings/info
back up to the original source KSON document at precise locations.
 Conflicts:
	kson-lib/src/commonMain/kotlin/org/kson/Kson.kt
	src/commonMain/kotlin/org/kson/ast/Ast.kt
The merge in f2219a1 changed how text is looked up on Tokens
and how strings are represented in the AST (quotes are not longer part
of `StringNode`s)
@dmarcotte dmarcotte merged commit 357dcc1 into kson-org:main Jan 6, 2026
1 check passed
@dmarcotte dmarcotte deleted the subparser-support branch January 6, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant