Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error range includes leading trivia, instead of just the offending token #841

Closed
OmarTawfik opened this issue Feb 21, 2024 · 7 comments · Fixed by #1172
Closed

error range includes leading trivia, instead of just the offending token #841

OmarTawfik opened this issue Feb 21, 2024 · 7 comments · Fixed by #1172
Assignees

Comments

@OmarTawfik
Copy link
Contributor

Originally discovered in #840:

Error: Expected OpenParen.
╭─[crates/solidity/testing/snapshots/cst_output/TupleDeconstructionStatement/with_var/input.sol:1:1]
1 │ ╭─▶ // "var" should be disabled in "0.5.0":
2 │ ├─▶ var (foo, bar) = rhs;
│ │
│ ╰─────────────────────────── Error occurred here.
───╯

@beta-ziliani
Copy link
Contributor

IIUC the problem stems from it being an unrecognized node at the top-most position: the trivia is not put anywhere.

The real question is how to attach trivia to an unrecognized node.

Otherwise, the solution is simply to advance the text index according to the trailing trivia, but that means the node text will contain all the trivia unparsed. Not sure we want that.

Here's the output of a run with the solution mentioned above
# This file is generated automatically by infrastructure scripts. Please don't edit by hand.

Source: >
  1  │ // "var" should be disabled in "0.5.0":                                          │ 0..39
  2  │ var (foo, bar) = rhs;                                                            │ 40..61

Errors: # 1 total
  - >
    Error: Expected OpenParen.
       ╭─[crates/solidity/testing/snapshots/cst_output/TupleDeconstructionStatement/with_var/input.sol:2:1]
       │
     2 │ var (foo, bar) = rhs;
       │ ───────────┬──────────
       │            ╰──────────── Error occurred here.
    ───╯

Tree:
  - (UNRECOGNIZED): '// "var" should be disabled in "0.5.0":\nvar (foo, ...' # (0..62)

@beta-ziliani
Copy link
Contributor

Right now something like the following (noting raise is also deprecated in 0.5.0):

// trivia
raise boom

produces just one Terminal node with type UNRECOGNIZED and all the text inside.

Instead, I propose to change the parser to produce a Nonterminal node with that type, attaching three nodes: the comment, the \n and the UNRECOGNIZED terminal raise boom. Something like:

Proposed parsing output
Nonterminal(NonterminalNode {
    kind: UNRECOGNIZED,
    text_len: TextIndex { ... },
    children: [
      Edge {
        label: Some(LeadingTrivia),
        node: Terminal(TerminalNode {
            kind: SingleLineComment,
            text: "// trivia ",
        }),
      },
      Edge {
          label: Some(LeadingTrivia),
          node: Terminal(TerminalNode {
              kind: EndOfLine,
              text: "\n",
          }),
      },
      Edge {
          label: None,
          node: Terminal(TerminalNode {
              kind: UNRECOGNIZED,
              text: "raise pum",
          })
      }
    ]
  })

I'm investigating this line.

@beta-ziliani
Copy link
Contributor

Ah, changing the way UNRECOGNIZED is processed is a big change. Partial matches with separators and any other occurrence of UNRECOGNIZED is affected. For instance, for VersionExpressionSets the text 1.0.0 || /* trivia */ boom || 2.0.0 also places /* trivia */ boom in an unrecongized terminal node.

@beta-ziliani
Copy link
Contributor

@OmarTawfik if you get the chance to read my comments above, I'll appreciate your input.

@OmarTawfik
Copy link
Contributor Author

OmarTawfik commented Nov 24, 2024

The real question is how to attach trivia to an unrecognized node.

produce a Nonterminal node with that type, attaching three nodes: the comment, the \n and the UNRECOGNIZED terminal raise boom.

Ah, changing the way UNRECOGNIZED is processed is a big change.

Thanks for the investigation. We do attach leading trivia to individual tokens. For example, when parsing something like this, trivia would be attached to foo without a separate nonterminal parent:

contract /* 1 */ foo {  }

I wonder if we can do the same here? modifying how trivia are stored, and adding the new nonterminal everywhere as you already guessed has many impacts, including increasing the tree size/depth, and changing how users process the results.

So, instead of just creating a giant token (with both trivia and content) to attach the error to, I wonder if the parse result can include both tokens separately, with the error attached to the content only.

@beta-ziliani
Copy link
Contributor

So, instead of just creating a giant token (with both trivia and content) to attach the error to, I wonder if the parse result can include both tokens separately, with the error attached to the content only.

But the problem is that the parser doesn't know what raise boom is, so it can't attach a meaningful type to an outer non-terminal. Or am I misunderstanding? Because in the example contract /* 1 */ foo { } the trivia is part of the children of ContractDefinition, along with the identifier foo.

@OmarTawfik
Copy link
Contributor Author

OmarTawfik commented Nov 26, 2024

so it can't attach a meaningful type to an outer non-terminal.

The specific combinator parsing the children of ContractDefinition doesn't know about the parent kind. It just users a SequenceHelper to collect the children (trivia and non-trivia), and the NonterminalKind::ContractDefinition is added later.

github-merge-queue bot pushed a commit that referenced this issue Dec 12, 2024
…ected nonterminal (#1172)

Fixes #841

In examples like the following:

```
// some trivia
unexpected
```

The parsing function was returning a terminal with `UNRECOGNIZED` kind
and the entire text inside. Since a terminal cannot have trivia
attached, we have to change the returning value to be a nonterminal. But
which one? For a moment I thought of a new specific `UNRECOGNIZED` kind,
but after discussing with Omar, it was clear that the better option was
to return the expected one: if we are parsing a `Statement`, then return
a `Statement` with the trivia and the `UNRECOGNIZED` terminal at the
end. This is realized in the `NoMatch` structure with a new field for
the optional expected nonterminal kind, which is essentially attached in
the `ParserResult::with_kind` method.

The commit history contains a previous attempt in which the trivia was
cached within the `NoMatch` struct. This added unnecessary complexity,
so I decided against it.

**Note1:** In a couple of places the code will use a
`ParserResult::no_match(<default values>)`. Noting there was a
`ParseResult::default()` function, I decided to use it instead, but I'm
not sure if this is the expected use of the default function.

**Note2:** I see that the error is one off at the end, probably
accounting EOF? But was already the case before, so there's no real
change there.
@github-project-automation github-project-automation bot moved this from ⏳ Todo to ✅ Done in Slang - 2024 H2 Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
4 participants