-
Notifications
You must be signed in to change notification settings - Fork 29
Usage: code API
Vincent Hellendoorn edited this page Jul 10, 2017
·
1 revision
The code is best used as a Java library. To get started with the code as a library, either add the Jar to your dependencies (Maven dependency coming soon!) or download the whole project and link it to yours. Have a look at slp.core.example
to see how you would do all the setup for a natural language (NLRunner) and a Java code (JavaRunner) example; it show-cases quite a few options that you can set.
The usual process takes about five steps:
- Set up the LexerRunner with options such as whether to add delimiters around lines, or whole files, which lexer to use (e.g. preserve punctuation, split on whitespace?)
- Set up the vocabulary by building it before-hand with some cut-off for infrequent words, or you could leave it open entirely (as turns out to be better for source code)
- Set up the ModelRunner with options for modeling, such as whether to treat each line as a sentence (vs. the whole file, e.g. for Java), what order n-grams to use.
- Set up a Model, e.g. a simple n-gram model, a model with cache, a mixture of global, local + cache, or an automatically nested model, maybe make it dynamic to learn every token right after modeling it.
- Run your model on whatever data you have. You can call
ModelRunner.model
(or.predict
) to model any sequence, file, or whole directory (recursively). What if you don't want to model just once? Maybe you want to model every commit in a project's history and then update with that commit right after modeling it? Easy! Because all the models are count-based, you can just wrap the modeling step in a loop and alternate with calls toModelRunner.learn
orModelRunner.forget
to keep your model up-to-date without retraining the whole thing.