Pangoro and Lixy have been replaced by Tegal Niwen (GitHub). This evolved version of Pangoro and Lixy feature parser type-safety, more expectations and matchers, an awesome execution debugger and much more! This repository will no longer be updated.
You can get the latest release/commit on JitPack.
Lixy is a "lexer" framework, and is a Kotlin Multi-platform Project. It is a library that allows you to turn a string into a sequence of tokens using rules that you define using a Kotlin DSL.
This lexical analysis is typically the first step when making a compiler of any kind.
A lexer will only get you so far. The next step in the compilation is parsing, which Pangoro can help you with if you are using Lixy!
You will notice when looking at examples that Lixy uses a specific syntax that might not look like real code at first. It is entirely valid Kotlin code! Lixy uses a kind of "domain-specific language": a language within a language in this case, specifically made to create lexers.
Be careful, Lixy is still in an experimental stage! The API may (and will) break constantly until a 0.1 version comes out. Lixy is already fairly usable, and most functions and classes are already documented using KDoc, but there are no user guides currently available. The best way to learn is by looking at the tests (src/main/test/kotlin/guru/zoroark/lixy).
This simple example shows you what can be done using a single state.
// We need this so we can use e.g. DOT instead of MyTokenTypes.DOT
import MyTokenTypes.*
enum class MyTokenTypes : LixyTokenType {
DOT, WORD, WHITESPACE
}
val lexer = lixy {
state {
"." isToken DOT
anyOf(" ", "\n", "\t") isToken WHITESPACE
matches("[A-Za-z]+") isToken WORD
}
}
val tokens = lexer.tokenize("Hello Kotlin.\n")
/*
* tokens = [
* ("Hello", 0, 5, WORD),
* (" ", 5, 6, WHITESPACE),
* ("Kotlin", 6, 11, WORD),
* (".", 11, 12, DOT),
* ("\n", 12, 13, WHITESPACE)
* ]
*/
This is fine, but we can do much more using multiple states, for example, a string detector that differentiates string content from content that is not from inside a string.
import TokenTypes.*
import Labels.*
enum class TokenTypes : LixyTokenType {
WORD, STRING_CONTENT, QUOTES, WHITESPACE
}
enum class Labels : LixyStateLabel {
IN_STRING
}
val lexer = lixy {
default state {
" " isToken WHITESPACE
matches("[a-zA-Z]+") isToken WORD
"\"" isToken QUOTES thenState IN_STRING
}
IN_STRING state {
// triple quotes to make it a raw string, so that we don't need to
// escape everything
matches("""(\\"|[^"])+""") isToken STRING_CONTENT
"\"" isToken QUOTES thenState default
}
}
val tokens = """Hello "Kotlin \"fans\"!" Hi"""
/*
* tokens = [
* (Hello, 0, 5, WORD),
* ( , 5, 6, WHITESPACE),
* (", 6, 7, QUOTES),
* (Kotlin \"fans\"!, 7, 23, STRING_CONTENT),
* (", 23, 24, QUOTES),
* ( , 24, 25, WHITESPACE),
* (Hi, 25, 27, WORD)
* ]
*/
There are a lot of possibilities!
You can get the following artifacts from Jitpack:
- Kotlin/JVM:
guru.zoroark.lixy:lixy-jvm:version
- Kotlin/JS:
guru.zoroark.lixy:lixy-js:version
- Kotlin MPP:
guru.zoroark.lixy:lixy:version