Peek(1) affects next token. #28

tiagolr · 2014-12-26T00:39:42Z

I'm using different rulesets in a custom parser for erazor lib.

Frequently i use peek(1) on a ruleset to verify if it should process what comes next or leave it to another ruleset.

The problem is that if a function using ruleset(A) decides not to process what comes next after peek(1), and another function with ruleset(B) processes what comes next, the first token will be the one defined by function (A) peek(1)

example:

function A() {
  stream.ruleset = MyLexer.ASomeRules;

  var tok = peek(1);
  switch(tok) {
      case SOME_RULE: B(); // let B process what comes next.
  }
}

function B() {
    stream.ruleset = MyLexer.BOtherRules;  

    // will throw SOME_RULE has no match in this switch.
    switch stream { 
          case OTHER_RULE: null;
    } 
}

tiagolr · 2014-12-26T00:43:30Z

It would be great if peek did not change state at all, other times i also have problems with using consequent peek(1), using peek(1), peek(1) is the same as using peek(2).

Maybe a reject() call to revert the effects from peek() or if peek did not affect the state would probably be perfect, just return the token at a certain position, think its possible?

Simn · 2014-12-26T01:25:07Z

Yes, this has been a problem ever since rulesets were added. Originally the problem was that the lexer was based on a data representation which did not support rewinding. I think we can fix this now by resetting the position upon changing rulesets, as well as clearing the token cache in the parser.

This might require some architectural thought though. We have to be careful not to make a mess of the Lexer/Parser/LexerTokenSource inter-dependencies.

tiagolr · 2014-12-26T02:17:57Z

Would be great, thanks, meanwhile ill see if i can workaround it for this parser.

Btw if only peek would not change the state and fetch the token, things would work great imo (even if at some performance cost), I don't know if thats feasible or what is involved compared to reseting the position when changing rulesets, sounds more robust, peek(1) peek(1) would even return the same token. Just a suggestion.

Simn · 2014-12-26T10:13:52Z

This would mean disabling the token cache completely, which I don't think is viable performance-wise. The lexer shouldn't have to tokenize the same thing twice if not necessary.

tiagolr · 2014-12-26T16:14:52Z

Well peek(0) is used internally to retrieve the current token sometimes, but if peek() was only called by the user and would use a different method other than Lexer.token() to retrieve the token, it might be possible to skip the cache and let the user be responsible for performance lost if using peek() too much.

I'm not sure how the state transitions are generated or how the cache works, if its state = state.trans.get(i) in Lexer.token() that alters the state permanently, it could work to skip that for peek(). But I'm just guessing.

tiagolr · 2014-12-26T21:38:51Z

Ok, what i did to avoid unwanted tokens parsed by peek(1) was to involve the case in a try catchand have no rule/match for the specific case i want other function/rule to process.

For now it works, guess i was lucky i did not rely on the peek(1) too much and in most cases the resulting token could be junked (junk()) after reading with peek(1), anyway it would be nice to have rewind or reverting cache after ruleset changes.

Thanks for this lib, did some benchmarks and the performance is about 2x faster than the previous parser using string comparison, not to mention its much easier to modify and read, with some luck will be ready in a couple of days.

Simn · 2014-12-26T22:04:47Z

I'll see what I can do about the caching issue soon. Out of curiosity, what are you trying to parse?

tiagolr · 2014-12-27T02:25:05Z

https://github.com/ciscoheat/erazor/blob/master/src/demo/Main.hx

You can see the templating language there, it uses string comparison to process the templates, trying to extend the templating language i ended rewriting the parser to make it easier to extend and add some nice features, now tweaking it to pass all the tests.

The tests are probably more readable:
https://github.com/ciscoheat/erazor/blob/master/test/erazor/TestParser.hx

fussybeaver · 2015-01-07T11:41:10Z

I don't know if it's relevant to the discussion, but for stipple I'm using a forward matching regular expression, then rewind the lexer position.

Simn · 2015-01-07T12:00:28Z

The problem is this:

You fetch a token with ruleset A.
The parser caches it.
You don't consume it.
Instead you try to fetch a token with ruleset B.
The parser still has the previous token cached.

This shouldn't be hard to solve because it's just a matter of resetting the state upon ruleset changes. The problem is finding a good architecture between lexer, parser and token source.

tiagolr · 2015-01-07T15:31:43Z

This stipple looks awesome, I'll have to spend some time looking at it, funny i was inspired by handlebars and i was trying to implement some of its features into erazor (thats how i got here).

With some luck it may fit perfectly what i was trying to do, which is to use a good templating engine with ufront. Thanks a lot!

tiagolr · 2015-01-11T22:05:39Z

Hey @fussybeaver , stipple looks really great, when will it be released as an haxelib? Guess it will have to wait until a new hxparse and haxe versions are released?

Anyway this makes my work unnecessary, thanks a lot, looking at the tests it seems to perfect, i can go on making my site, and perhaps using this beta version of stipple.

I'll let you know manage to add it to ufront.

fussybeaver · 2015-01-12T10:16:08Z

As you say, I am waiting for the new haxe version to be released, and I hope @Simn will release a haxelib version of hxparse too.

Regarding hxparse I think I would like to see improvements in a non-macro based API, as I am using macros to improve runtime performance, and without build macros this doesn't seem to be quite as idiomatic for using hxparse.

Simn · 2015-01-12T10:29:23Z

I'm not sure what that would look like unless you want to call peek and junk manually all the time, which I think you already can anyway. One thing we could do is add a flag which writes the parser output to a .hx file so it can be used as a preprocessor.

fussybeaver · 2015-01-12T10:40:23Z

You can call peek and junk manually, it's works well though it feels like the API could be made nicer. Would a preprocessor require compiling twice ? Maybe simply having non-build macros would also work.

Simn · 2015-01-12T10:53:55Z

I don't really trust macro-in-macro anyway, but that should be possible. It would look a bit weird because you would have to pass the switch expression as an argument to a macro call, but that should be fine too.

fussybeaver · 2015-01-12T11:16:40Z

Yeh perhaps it's better to just leave it as is.. just wanted to make you aware of that use case.

Simn · 2015-01-12T13:08:21Z

It's an easy change because all the code is already there, so I have added hxparse.Parser.parse which expects a switch expression as argument. Please check if that works for you.

fussybeaver · 2015-01-12T21:46:16Z

It looks good, the code is more maintainable, the compile-time performance impact seems to be minimal on the test suite (at least after several runs), so I'll use this solution for the time being.. Is there some issue I need to be aware of for macro-in-macro calls ?

tiagolr · 2015-01-16T03:18:45Z

Hey @fussybeaver , added support for stipple into ufront, its working like a charm so far. ufront/ufront-mvc#4

Thanks again for this lib, a good templating engine is essential, cant bootlick more than this!

fussybeaver · 2015-01-16T11:38:44Z

Looks nice! Let me know if I can help you with any issues..

Simn · 2015-01-18T21:37:27Z

@fussybeaver: We'll have to move that parse method somewhere else. I forgot that Parser is supposed to be @:generic and that disallows static fields, even macros.

Simn · 2015-01-18T21:56:45Z

Actually I might fix it in core Haxe: HaxeFoundation/haxe#3766

Simn · 2015-12-03T15:53:02Z

Back to the original issue, here's a small example to reproduce it:

import byte.ByteData;
import hxparse.*;

class MyLexer extends hxparse.Lexer implements RuleBuilder {
    static public var tok = @:rule [
        "a" => {
            1;
        }
    ];

    static public var tok2 = @:rule [
        "a" => {
            2;
        }
    ];
}

class MyParser extends Parser<LexerTokenSource<Int>, Int> implements ParserBuilder {
    public function new(input:ByteData) {
        var source = new LexerTokenSource(new MyLexer(input), MyLexer.tok);
        super(source);
    }

    public function changeRuleset() {
        stream.ruleset = MyLexer.tok2;
    }
}

class Main {
    @:access(hxparse.Parser.peek)
    static function main() {
        var input = ByteData.ofString("a");
        var parser = new MyParser(input);
        hxparse.Utils.catchErrors(input, function() {
            trace(parser.peek(0)); // 1
            parser.changeRuleset();
            trace(parser.peek(0)); // 1
        });
    }
}

To solve this, assignment to stream.ruleset would have to do two things:

Set parser.token = null to disable the cache.
Rewind lexer.pos to the last match position.

We could do 2. by turning LexerTokenSource.ruleset into a property with a setter and have that setter call something on lexer to rewind. However, there's no good place to deal with 1. LexerTokenSource doesn't know what a parser is (so it can't set anything on it) and Parser doesn't know what a Ruleset is (so we cannot have a setRuleset method on it which would deal with that).

Any ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peek(1) affects next token. #28

Peek(1) affects next token. #28

tiagolr commented Dec 26, 2014

tiagolr commented Dec 26, 2014

Simn commented Dec 26, 2014

tiagolr commented Dec 26, 2014

Simn commented Dec 26, 2014

tiagolr commented Dec 26, 2014

tiagolr commented Dec 26, 2014

Simn commented Dec 26, 2014

tiagolr commented Dec 27, 2014

fussybeaver commented Jan 7, 2015

Simn commented Jan 7, 2015

tiagolr commented Jan 7, 2015

tiagolr commented Jan 11, 2015

fussybeaver commented Jan 12, 2015

Simn commented Jan 12, 2015

fussybeaver commented Jan 12, 2015

Simn commented Jan 12, 2015

fussybeaver commented Jan 12, 2015

Simn commented Jan 12, 2015

fussybeaver commented Jan 12, 2015

tiagolr commented Jan 16, 2015

fussybeaver commented Jan 16, 2015

Simn commented Jan 18, 2015

Simn commented Jan 18, 2015

Simn commented Dec 3, 2015

Peek(1) affects next token. #28

Peek(1) affects next token. #28

Comments

tiagolr commented Dec 26, 2014

tiagolr commented Dec 26, 2014

Simn commented Dec 26, 2014

tiagolr commented Dec 26, 2014

Simn commented Dec 26, 2014

tiagolr commented Dec 26, 2014

tiagolr commented Dec 26, 2014

Simn commented Dec 26, 2014

tiagolr commented Dec 27, 2014

fussybeaver commented Jan 7, 2015

Simn commented Jan 7, 2015

tiagolr commented Jan 7, 2015

tiagolr commented Jan 11, 2015

fussybeaver commented Jan 12, 2015

Simn commented Jan 12, 2015

fussybeaver commented Jan 12, 2015

Simn commented Jan 12, 2015

fussybeaver commented Jan 12, 2015

Simn commented Jan 12, 2015

fussybeaver commented Jan 12, 2015

tiagolr commented Jan 16, 2015

fussybeaver commented Jan 16, 2015

Simn commented Jan 18, 2015

Simn commented Jan 18, 2015

Simn commented Dec 3, 2015