Precise location reporting for sub-parsers #286
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enable precise error/message reporting by mapping positions in processed string content back to original source locations.
Motivation
When a sub-parser processes embedded content (like SQL in an embed block or escaped sequences in quoted strings), errors want to point to the exact location in the original KSON document. For example, if there's a syntax error in character 15 of processed SQL, we need to trace that back through indent trimming and escape processing to find where character 15 maps to in the raw source.
Implementation approach
We maintain mappings/source maps from processed KSON content to the original KSON source so we are always able to translate a location reported on processed content (i.e. an embed block with its minimum indent stripped and escapes evaluated) back to a location in the original KSON document.
The core challenge this change needed to solve was that tokens were being transformed into values too early in the parsing pipeline, losing the connection to their source text. This involved several steps:
Token handling refactor: AST nodes now hold references to their
sourceTokensrather than pre-computed values. Each node owns the transformation from raw tokens to processed values, which naturally positions them to track the mapping between the two.Content transformers: Introduced transformer classes (
QuotedStringContentTransformer,EmbedContentTransformer) that handle the conversion from raw to processed content while maintaining a source map. These map each character position in the processed output back to its position in the raw input.Sub-location API: Extended
KsonValueandKsonStringwithsublocation()methods that compose these mappings, allowing sub-parsers to report errors at precise locations in the original document.Benefits
TODO