Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer.has does not find error token. #76

Closed
JoshuaGrams opened this issue Sep 28, 2017 · 2 comments
Closed

Lexer.has does not find error token. #76

JoshuaGrams opened this issue Sep 28, 2017 · 2 comments
Labels

Comments

@JoshuaGrams
Copy link
Contributor

var moo = require('moo');
var lexer = moo.compile({
	word: /\w+/,
	ws: { match: /\s+/, lineBreaks: true },
	somethingElse: moo.error
});
console.log('has word?', lexer.has('word'));
console.log('has ws?', lexer.has('ws'));
console.log('has somethingElse?', lexer.has('somethingElse'));

This means that you can't define an error token and use it with Nearley? AFAICT it calls Lexer.has on every token that you use. At any rate, if you're going to claim that you can define an error token instead of throwing an error, then it should behave just like any other token in all respects.

@JoshuaGrams
Copy link
Contributor Author

Ah, shoot. I was expecting that an error token would have no contents and let you continue parsing, instead of taking the rest of the input. I'm trying to do a thing with indentation and markdown-style lists. So I thought I could lex with newlines pushing a line-marker state which would recognize whitespace as indentation, and then * or + or - would give list marker tokens which would pop the state, and an error would return an unmarked token and pop the state. Is there a better way to do this?

@tjvr
Copy link
Collaborator

tjvr commented Sep 28, 2017

Yes, Nearley uses Lexer.has to work out whether a %token is exposed by Moo, or a custom token matcher. You're right, has() should return true for error tokens.

I was expecting that an error token would have no contents and let you continue parsing, instead of taking the rest of the input

When none of your rules match, Moo doesn't know what to do. So you can either have it throw an error, or return an error token with the whole of the rest of the input. (I've updated the README to clarify this.)

I think error tokens are the wrong thing here. Generally tokenizers work best when your tokens are small atomic units: so I would separate your newline rule from your rule for leading whitespace, for example. You probably want something like Nathan's transformer to turn indentation into INDENT and DEDENT tokens.

EDIT: note that if you want this behaviour (error tokens having no contents), you can always implement it yourself on top of Moo's existing API. :-)

@tjvr tjvr added the question label Sep 28, 2017
@tjvr tjvr closed this as completed in d3fbaff Oct 5, 2017
tjvr added a commit that referenced this issue Oct 5, 2017
Lexer.has: return true for error token, if any (fixes #76).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants