Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add node helpers to scanner #15

Merged
merged 4 commits into from
Dec 24, 2020
Merged

Conversation

byCedric
Copy link
Contributor

@byCedric byCedric commented Dec 24, 2020

fixes #7

This adds a really simple node builder to the scanner.

How it works

  1. scanner.enter(<type>, <content>): <node>
    Creates a node, using content as leading factor to determine if this is a literal or parent node. It adds the current position as start to the node.

  2. scanner.exit(<node>): <node>
    Adds the current position as end and returns the node for single line usage like return scanner.exit(node). It mutates the node, so you can exit the node and return it later.

  3. scanner.abort(<node>, <expected-tokens>?): <Error>
    This is a replacement for our invalidToken method from index.js. It creates an error to return or throw. It also rewinds the scanner position to the start of the node, to support optional tokens.

Other thoughts

  • I removed some of the expected tokens from the stateful methods. They were containing the actual node type that was expected, not the token characters. For replacement, I added a fallback to scanner.abort that uses <type> notation as expected/valid token.

  • I had another idea to create a dedicated node builder instance. The reason main reason I didn't go for this route is related to abstraction of the node content. If we do this, we have to provide an (unnecessary) abstraction for adding children or value. I really like the flexibility that we have right now, e.g. node = scanner.enter(); node.customProp = '('; return node.

  • We could split up the enter method into enterLiteral/enterParent, that makes it more explicit. But, looking at unist-builder, they provide a similar interface to what we have now. (with an exception for void nodes, but I don't think we will use them)

  • We can also move these methods to other helpers. But, because we were doing a lot of invalidToken(scanner, ... statements, I think it's better to have them in the same context.

  • Adding debug statements are also pretty easy, we could add debug(type)('enter') and debug(type)('exit') to these helpers to "visualize" the codepath of the parser. But for now, I don't think this is necessary.

@byCedric byCedric requested review from bcoe and wesleytodd December 24, 2020 16:45
return node
}

abort (node, expectedTokens) {
Copy link
Contributor Author

@byCedric byCedric Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea: if this is too much for every abort, we can also add fail or something like that. We can move the error to that method, and only use abort for rewinding to the starting position of the node.

Note that aborting a failed parsing attempt of a token, always rewinds the position back to node's start. Right now, that seems to work pretty good but I'm not sure how good that works with @bcoe's new recursive body parsing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that calling abort would cause any issues, let's hold off on adding an additional method until we know we need to differentiate between an abort and failure.

}

abort (node, expectedTokens) {
const position = `${this.pos.line}:${this.pos.column}`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use unist-util-stringify-position instead of the ${pos.line}:${pos.column}, because we have the attempted node information?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there will be tooling value in having this as as well { line: <line>, column: <column> } on the error. Maybe before line 80 we could add error.position = this.pos?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another helpful pattern I like to follow (used in node, which is good precedence I think) is to have distinct .code properties for these errors. If we were tooling around errors, checking err.code === 'EOF_ERROR is much better than err.message.startsWith('unexpected token EOF at').

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there would be value in adding additional context to the error object. I also wonder if we should consider returning an error rather than ever throwing? or perhaps have a "best effort" mode, and a mode that throws?

@@ -29,18 +29,18 @@ describe('<message>', () => {
it('throws error when ":" token is missing', () => {
expect(() => {
parser('feat add support for scopes')
}).to.throw("unexpected token ' ' at position 1:5 valid tokens [(, !, :]")
}).to.throw("unexpected token ' ' at 1:5, valid tokens [(, !, :]")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Position itself is a pretty distinct format, keeping this short and descriptive is important. But, feel free to roll this rewording back!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of this rewording.

Copy link

@wesleytodd wesleytodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

abort (node, expectedTokens) {
const position = `${this.pos.line}:${this.pos.column}`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there will be tooling value in having this as as well { line: <line>, column: <column> } on the error. Maybe before line 80 we could add error.position = this.pos?

}

abort (node, expectedTokens) {
const position = `${this.pos.line}:${this.pos.column}`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another helpful pattern I like to follow (used in node, which is good precedence I think) is to have distinct .code properties for these errors. If we were tooling around errors, checking err.code === 'EOF_ERROR is much better than err.message.startsWith('unexpected token EOF at').

Copy link
Member

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really solid.

}
node.children.push(bodyFooter(scanner))
node.position = { start, end: scanner.position() }
return node
return scanner.exit(node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much nicer API 😄

return node
}

abort (node, expectedTokens) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that calling abort would cause any issues, let's hold off on adding an additional method until we know we need to differentiate between an abort and failure.

}

abort (node, expectedTokens) {
const position = `${this.pos.line}:${this.pos.column}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there would be value in adding additional context to the error object. I also wonder if we should consider returning an error rather than ever throwing? or perhaps have a "best effort" mode, and a mode that throws?

@@ -29,18 +29,18 @@ describe('<message>', () => {
it('throws error when ":" token is missing', () => {
expect(() => {
parser('feat add support for scopes')
}).to.throw("unexpected token ' ' at position 1:5 valid tokens [(, !, :]")
}).to.throw("unexpected token ' ' at 1:5, valid tokens [(, !, :]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of this rewording.

@byCedric byCedric force-pushed the @bycedric/refactor/scanner branch from f4aff15 to ad0ae52 Compare December 24, 2020 18:14
@byCedric byCedric merged commit ef8a6ca into main Dec 24, 2020
@byCedric byCedric deleted the @bycedric/refactor/scanner branch December 24, 2020 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor: Introduce Node builder
3 participants