Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(botonic-nlu): first approach nlu refactor #940

Closed
wants to merge 7 commits into from

Conversation

vanbasten17
Copy link
Contributor

Set as Draft PR if it's not ready to be merged.

PR best practices Reference

Description

A first approach of an improved version of current botonic-nlu engine after being migrated to Typescript.
We still need to modify botonic-cli and (probably) botonic-plugin-nlu, to have it working coordinately.

Context

The previous botonic-nlu engine was hard to maintain as it was written initially in JS. This is an improved version of that project which tries to solve several things:

  • Be able to train with a custom tokenizer (it only needs to implement a tokenize method)
  • Be able to train with a custom model declared explicitly by the dev
  • Migration to TS. Self-descriptive code, more readable, typed. Better maintainability and flexibility
  • Introduced some tests which were strongly necessary because is easy to break things in this kind of project
  • Tried to use as many TFJS native methods as possible

Approach taken / Explain the design

This NLU provides a new kind of api to operate with the NLU engine. It can be used as a stand-alone node application to play with trainings and predictions. The main idea of this engine is the following:

  1. We instantiate a BotonicNLU object which provides the main interface. Let's say we declare const nlu = new BotonicNLU
  2. We load into memory the data structure which will contain a locale with a set of intents and each intent with a set of utterances with the method nlu.addExample(locale, intent, utterance).
  3. Then we can initialize a "trainer" for a specific locale, like this: nlu.train('en').
  4. Now we can tune this trainer like this:
const trainer = nlu
  .train('en')
  .withTokenizer(new natural.TreebankWordTokenizer())
  .withParams({ epochs: 30, validationSplit: 0.2 });
  1. Then we only need to run it: async() => await trainer.run(). This will automatically trigger the steps of preprocessing, loading the word embeddings and train with a NN model.
  2. Finally, we can save the results or try some predictions:
trainer.save(); // Saving the model
trainer.predict('How can i reach the park?');

For the development workflow, I have been using nodemon under the npm script npm run start:dev. (check index.ts).

To document / Usage example

const nlu = initFromDirectory('en');

const simpleTrainer = nlu
  .train('en')
  .withTokenizer(new natural.TreebankWordTokenizer())
  .withParams({ epochs: 30, validationSplit: 0.2 });

(async () => {
  await simpleTrainer.run();
  toPredict.forEach((input) => simpleTrainer.predict(input));
})();

Pending Steps

  • Keep polishing everything: Adding more tests, improve execution flow, etc.
  • Integrate all the pipeline: adapt botonic-cli to run the file from user directory, uploading generated files to Netlify, proper working with plugin-nlu and model deployed, think about retrieving the tokenizer used in botonic-nlu to be loaded in botonic-plugin-nlu.
  • Add entities support, slot filling, ...

If you want to run the example:

  1. yarn install
  2. yarn build
  3. node dev-example.js

Testing

  • has unit tests
  • has integration tests
  • doesn't need tests because... [provide a description]

@vanbasten17 vanbasten17 added the enhancement New feature or request label Sep 21, 2020
@vanbasten17 vanbasten17 marked this pull request as draft September 21, 2020 15:29
@vanbasten17 vanbasten17 force-pushed the feat/botonic-nlu-refactor branch from b063ac7 to 965938e Compare September 29, 2020 08:10
@vanbasten17 vanbasten17 force-pushed the feat/botonic-nlu-refactor branch from e9d03ae to 6b59984 Compare October 6, 2020 12:43
@vanbasten17 vanbasten17 mentioned this pull request Oct 7, 2020
@vanbasten17
Copy link
Contributor Author

Closing as the merged PR is this one: #984

@vanbasten17 vanbasten17 closed this Oct 7, 2020
@vanbasten17 vanbasten17 deleted the feat/botonic-nlu-refactor branch October 7, 2020 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants