fix matching for quoted strings #11

kevinludwig · 2014-11-25T19:59:09Z

Tokenizer is using the dissect library to find match strings. disect uses a binary search to minimize the number of comparisons it has to make. That only works if the input sequence is sorted, (or if the predicate function can determine if the match is greater than, or less than, the current match. I'm sure there are a variety of cases which break because of this, but the one I found was a not short quoted string. If you put console.log inside the code, you'll see that the algorithm around disect is cutting the match in half, and continues to shorten the string to look for a match (ultimately determining incorrectly that there is no match).

I changed the code to do a for loop from the end of the string back to the start looking for the longest string with a matching rule. This works and all existing tests continue to work.

Floby · 2014-11-26T23:50:51Z

Hello,
Just to tell you that i've seen this and there's also #9 looking interesting.
I should be reviewing these two pretty soon.

farskipper · 2015-06-17T16:50:12Z

Hey, I also had problems due to bisection. It seems like for some tokenizing problems bisection is definitely the right call, but not for all of them. For that reason I made tokenizer2 which doesn't use bisection.

fix matching for quoted strings

b7f7b9d

Floby mentioned this pull request Nov 26, 2014

Several bug fixes + performance improvements (HOLD FOR NOW) #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix matching for quoted strings #11

fix matching for quoted strings #11

kevinludwig commented Nov 25, 2014

Floby commented Nov 26, 2014

farskipper commented Jun 17, 2015

fix matching for quoted strings #11

Are you sure you want to change the base?

fix matching for quoted strings #11

Conversation

kevinludwig commented Nov 25, 2014

Floby commented Nov 26, 2014

farskipper commented Jun 17, 2015