Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peek(1) affects next token. #28

Open
tiagolr opened this issue Dec 26, 2014 · 24 comments
Open

Peek(1) affects next token. #28

tiagolr opened this issue Dec 26, 2014 · 24 comments

Comments

@tiagolr
Copy link
Contributor

tiagolr commented Dec 26, 2014

I'm using different rulesets in a custom parser for erazor lib.

Frequently i use peek(1) on a ruleset to verify if it should process what comes next or leave it to another ruleset.

The problem is that if a function using ruleset(A) decides not to process what comes next after peek(1), and another function with ruleset(B) processes what comes next, the first token will be the one defined by function (A) peek(1)

example:

function A() {
  stream.ruleset = MyLexer.ASomeRules;

  var tok = peek(1);
  switch(tok) {
      case SOME_RULE: B(); // let B process what comes next.
  }
}

function B() {
    stream.ruleset = MyLexer.BOtherRules;  

    // will throw SOME_RULE has no match in this switch.
    switch stream { 
          case OTHER_RULE: null;
    } 
}
@tiagolr
Copy link
Contributor Author

tiagolr commented Dec 26, 2014

It would be great if peek did not change state at all, other times i also have problems with using consequent peek(1), using peek(1), peek(1) is the same as using peek(2).

Maybe a reject() call to revert the effects from peek() or if peek did not affect the state would probably be perfect, just return the token at a certain position, think its possible?

@Simn
Copy link
Owner

Simn commented Dec 26, 2014

Yes, this has been a problem ever since rulesets were added. Originally the problem was that the lexer was based on a data representation which did not support rewinding. I think we can fix this now by resetting the position upon changing rulesets, as well as clearing the token cache in the parser.

This might require some architectural thought though. We have to be careful not to make a mess of the Lexer/Parser/LexerTokenSource inter-dependencies.

@tiagolr
Copy link
Contributor Author

tiagolr commented Dec 26, 2014

Would be great, thanks, meanwhile ill see if i can workaround it for this parser.

Btw if only peek would not change the state and fetch the token, things would work great imo (even if at some performance cost), I don't know if thats feasible or what is involved compared to reseting the position when changing rulesets, sounds more robust, peek(1) peek(1) would even return the same token. Just a suggestion.

@Simn
Copy link
Owner

Simn commented Dec 26, 2014

This would mean disabling the token cache completely, which I don't think is viable performance-wise. The lexer shouldn't have to tokenize the same thing twice if not necessary.

@tiagolr
Copy link
Contributor Author

tiagolr commented Dec 26, 2014

Well peek(0) is used internally to retrieve the current token sometimes, but if peek() was only called by the user and would use a different method other than Lexer.token() to retrieve the token, it might be possible to skip the cache and let the user be responsible for performance lost if using peek() too much.

I'm not sure how the state transitions are generated or how the cache works, if its state = state.trans.get(i) in Lexer.token() that alters the state permanently, it could work to skip that for peek(). But I'm just guessing.

@tiagolr
Copy link
Contributor Author

tiagolr commented Dec 26, 2014

Ok, what i did to avoid unwanted tokens parsed by peek(1) was to involve the case in a try catchand have no rule/match for the specific case i want other function/rule to process.

For now it works, guess i was lucky i did not rely on the peek(1) too much and in most cases the resulting token could be junked (junk()) after reading with peek(1), anyway it would be nice to have rewind or reverting cache after ruleset changes.

Thanks for this lib, did some benchmarks and the performance is about 2x faster than the previous parser using string comparison, not to mention its much easier to modify and read, with some luck will be ready in a couple of days.

@Simn
Copy link
Owner

Simn commented Dec 26, 2014

I'll see what I can do about the caching issue soon. Out of curiosity, what are you trying to parse?

@tiagolr
Copy link
Contributor Author

tiagolr commented Dec 27, 2014

https://github.com/ciscoheat/erazor/blob/master/src/demo/Main.hx

You can see the templating language there, it uses string comparison to process the templates, trying to extend the templating language i ended rewriting the parser to make it easier to extend and add some nice features, now tweaking it to pass all the tests.

The tests are probably more readable:
https://github.com/ciscoheat/erazor/blob/master/test/erazor/TestParser.hx

@fussybeaver
Copy link

I don't know if it's relevant to the discussion, but for stipple I'm using a forward matching regular expression, then rewind the lexer position.

@Simn
Copy link
Owner

Simn commented Jan 7, 2015

The problem is this:

  • You fetch a token with ruleset A.
  • The parser caches it.
  • You don't consume it.
  • Instead you try to fetch a token with ruleset B.
  • The parser still has the previous token cached.

This shouldn't be hard to solve because it's just a matter of resetting the state upon ruleset changes. The problem is finding a good architecture between lexer, parser and token source.

@tiagolr
Copy link
Contributor Author

tiagolr commented Jan 7, 2015

This stipple looks awesome, I'll have to spend some time looking at it, funny i was inspired by handlebars and i was trying to implement some of its features into erazor (thats how i got here).

With some luck it may fit perfectly what i was trying to do, which is to use a good templating engine with ufront. Thanks a lot!

@tiagolr
Copy link
Contributor Author

tiagolr commented Jan 11, 2015

Hey @fussybeaver , stipple looks really great, when will it be released as an haxelib? Guess it will have to wait until a new hxparse and haxe versions are released?

Anyway this makes my work unnecessary, thanks a lot, looking at the tests it seems to perfect, i can go on making my site, and perhaps using this beta version of stipple.

I'll let you know manage to add it to ufront.

@fussybeaver
Copy link

As you say, I am waiting for the new haxe version to be released, and I hope @Simn will release a haxelib version of hxparse too.

Regarding hxparse I think I would like to see improvements in a non-macro based API, as I am using macros to improve runtime performance, and without build macros this doesn't seem to be quite as idiomatic for using hxparse.

@Simn
Copy link
Owner

Simn commented Jan 12, 2015

I'm not sure what that would look like unless you want to call peek and junk manually all the time, which I think you already can anyway. One thing we could do is add a flag which writes the parser output to a .hx file so it can be used as a preprocessor.

@fussybeaver
Copy link

You can call peek and junk manually, it's works well though it feels like the API could be made nicer. Would a preprocessor require compiling twice ? Maybe simply having non-build macros would also work.

@Simn
Copy link
Owner

Simn commented Jan 12, 2015

I don't really trust macro-in-macro anyway, but that should be possible. It would look a bit weird because you would have to pass the switch expression as an argument to a macro call, but that should be fine too.

@fussybeaver
Copy link

Yeh perhaps it's better to just leave it as is.. just wanted to make you aware of that use case.

@Simn
Copy link
Owner

Simn commented Jan 12, 2015

It's an easy change because all the code is already there, so I have added hxparse.Parser.parse which expects a switch expression as argument. Please check if that works for you.

@fussybeaver
Copy link

It looks good, the code is more maintainable, the compile-time performance impact seems to be minimal on the test suite (at least after several runs), so I'll use this solution for the time being.. Is there some issue I need to be aware of for macro-in-macro calls ?

@tiagolr
Copy link
Contributor Author

tiagolr commented Jan 16, 2015

Hey @fussybeaver , added support for stipple into ufront, its working like a charm so far. ufront/ufront-mvc#4

Thanks again for this lib, a good templating engine is essential, cant bootlick more than this!

@fussybeaver
Copy link

Looks nice! Let me know if I can help you with any issues..

@Simn
Copy link
Owner

Simn commented Jan 18, 2015

@fussybeaver: We'll have to move that parse method somewhere else. I forgot that Parser is supposed to be @:generic and that disallows static fields, even macros.

@Simn
Copy link
Owner

Simn commented Jan 18, 2015

Actually I might fix it in core Haxe: HaxeFoundation/haxe#3766

@Simn
Copy link
Owner

Simn commented Dec 3, 2015

Back to the original issue, here's a small example to reproduce it:

import byte.ByteData;
import hxparse.*;

class MyLexer extends hxparse.Lexer implements RuleBuilder {
    static public var tok = @:rule [
        "a" => {
            1;
        }
    ];

    static public var tok2 = @:rule [
        "a" => {
            2;
        }
    ];
}

class MyParser extends Parser<LexerTokenSource<Int>, Int> implements ParserBuilder {
    public function new(input:ByteData) {
        var source = new LexerTokenSource(new MyLexer(input), MyLexer.tok);
        super(source);
    }

    public function changeRuleset() {
        stream.ruleset = MyLexer.tok2;
    }
}

class Main {
    @:access(hxparse.Parser.peek)
    static function main() {
        var input = ByteData.ofString("a");
        var parser = new MyParser(input);
        hxparse.Utils.catchErrors(input, function() {
            trace(parser.peek(0)); // 1
            parser.changeRuleset();
            trace(parser.peek(0)); // 1
        });
    }
}

To solve this, assignment to stream.ruleset would have to do two things:

  1. Set parser.token = null to disable the cache.
  2. Rewind lexer.pos to the last match position.

We could do 2. by turning LexerTokenSource.ruleset into a property with a setter and have that setter call something on lexer to rewind. However, there's no good place to deal with 1. LexerTokenSource doesn't know what a parser is (so it can't set anything on it) and Parser doesn't know what a Ruleset is (so we cannot have a setRuleset method on it which would deal with that).

Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants