ParseJS is a simple library I made (and bug-fixed) in the span of two (2) days. Currently, ParseJS (and ParseTS) remain *mostly* complete.
For an example of how you could use this library, check out this Brain**** interpreter I made on SoloLearn's Web-Development code playground.
The developer's suggested method of installation is by downloading the contents of the repo into a zip, and extracting them.
But, if you don't want to have the file on your drive, you can use the CDN (Content Delivery Network) script import:
<script src="https://cdn.jsdelivr.net/gh/CalinZBaenen/ParseJS@main/src/parse_string.js"></script>
The parse_string
function of ParseJS takes in a block of text, and a list of keywords (tokens*) and scans the text you provided letter by letter and returns a list.
If there is a keyword that begins with the letter the function is currently looking at, checks if the sequence of letters ahead of the current letter (in combination with the current one) spells out a valid keyword. If it does, a symbol
representing the token found will be inserted into the list, otherwise the current letter is inserted instead.
The parse_string
function of ParseJS takes in a string (str
) and an array of strings (toks
) and returns an array (parsed_array
) of string
OR symbol
(Array<string|symbol>
).
parse_string
iterates over str
, and if there is a string (keyword*) in toks
that begins with the current character being iterated over, it will check if the following sequence of characters forms a valid keyword.
If a valid keyword is found, a symbol
(Symbol.for( tok )
) is inserted into parsed_array
, otherwise the current character is inserted instead.
So, we have this code. - Let's try to explain what's going on, and why we get the output we do.
// We tell `parse_string` that we want it to read "Knuckles, Tails, Amy, Sonic", but only
// search for "Sonic", "Tails", and "Knuckles".
parse_string("Knuckles, Tails, Amy, Sonic", [
"Sonic", // "Sonic" is a keyword because it is included in this list.
"Tails", // Same for "Tails".
"Knuckles" // Ditto.
]);
This produces the output:
[
Symbol.for(Knuckles), // "Knuckles" was a keyword - `parse_string` found "Knuckles".
',', // The character right after "Knuckles".
' ', // Character after the character after knuckles. -- This isn't a keyword, so it's left alone.
Symbol.for(Tails),
',',
' ',
'A', // "Amy" isn't a keyword, so her name is left alone.
'm',
'y',
',',
' ',
Symbol.for(Sonic) // Sonic's at the end, but he was still found, so his name is "tokenized".
]
Now. Lets do some more fiddling around. . .
parse_string("test12 test1 test2 test", [
"test",
"test1",
"test12",
"test2"
]);
Ok, so, let's walk through this.
t
Well, we have t
. That's a good start. Now we look to see what keywords start with t
-
Oh... well, that's strange. It looks like we have four "candidates".
Let's remove the obvious loser: .test2
Now, we still have three possible candidates: test
, test1
, and test12
.
How do we know which one to pick? - Simple. First we sort the candidates by length; test12
, test1
, test
.
Then, we arrange candidates in dictionary order. For this case, it doesn't change anything.
So. Now what? Well, let's scan ahead, if the next characters are est1
, then we could use test1
. - BUT, if the next characters are est12
, we could use test12
.
Since test12
is longer than test1
or test
, it takes precedents. I.e. parse_string
prefers this longer token because it's more confident this prediction is correct.
As of the latest patch; 0.051, you can remove the extra letters from the returned list.
It turns THIS output:
into THIS output:
So... How do we get this cleaner output?
Well, when you pass in your text and the keywords you want to find, you can also pass in a boolean (a yes or no value) that indicates if you want to keep the clutter. For backwards compatibility, this option is true
(yes*) by default.
- Version 0.01: Tokens aren't sorted in order of length, causing inaccuracies. - Fixed in Version 0.02.
- Version 0.01 - Version 0.04: When searching for multiple keyword "candidates", ghost characters would be left behind. See this DEV article for more info. - Fixed in Version 0.05.