Skip to content

Latest commit

 

History

History
254 lines (156 loc) · 10.7 KB

RosaeNLG.adoc

File metadata and controls

254 lines (156 loc) · 10.7 KB

RosaeNLG Project Proposal

Name of project: RosaeNLG

Requested project maturity level: Sandbox

Project description

Template-based Natural Language Generation (NLG) automates the production of relatively repetitive texts based on structured input data and textual templates, run by a NLG engine. Production usage is widespread in large corporations, especially in the financial industry.

Typical usecases are:

  • describing a product based on its features for SEO purposes

  • produce structured reports like risk reports or fund performance in the financial industry

  • generate well formed chatbot answers

RosaeNLG is an open source NLG engine. It aims:

  • to offer the same NLG features as product NLG solutions

  • to be developer and IT friendly for template configuration and integration

  • to provide NLG on both server-side and browser-side

Statement on alignment with LF AI’s mission

NLG is a brick to build business services directly aimed at final users. It is widely used in the industry.

NLG concurs to democratization and understandability of AI:

  • Non-expert users don’t understand figures and dashboards and prefer textual explanations

  • Computer-generated texts can be superior (from the reader’s perspective) to human-written texts

  • At the end of an AI pipeline, NLG can automate and convey expertise, explain and summarize situations, and communicate with end users

  • Bring trusted AI, typically in collaboration with AI Explainability 360 and AI Fairness 360

RosaeNLG project will also:

  • Increase diversity: first project originated in France

  • Foster usage, contributions and diversity in NLG domain, supporting languages presently not covered by any NLG system at all

Possible collaboration opportunities with current LF AI hosted projects

RosaeNLG currently runs on Acumos for Orange AI Marketplace.

RosaeNLG can be used at the end of the AI pipeline, to explain a decision to non-experts:

  • AI Explainability 360: provide a clear, readable, summarized explanation for an end user (e.g. Bank Customer) asking for explanations

  • AI Fairness 360: generate comprehensive compliance reports on fairness (initial situation, what was done, final situation)

Potential usage of MLflow, especially Model Registry, to manage templates.

License name

Source control (GitHub, etc.)

Technical tooling:

GH organization

The project has it own GH organization: RosaeNLG organization on Github

GH DCO app

GH DCO app is active.

Issue tracker (GitHub, JIRA, etc)

Collaboration tools (mailing lists, wiki, IRC, Slack, Glitter, etc.)

External dependencies including licenses (name and version) of those dependencies

RosaeNLG is a fork of the Pug template engine (MIT).

It is composed of 70 submodules. Most of these modules are an original part of RosaeNLG, with the same Apache 2.0 license, and are not listed below.

Depending on the output language, RosaeNLG will load some linguistic resources and use linguistic libraries, to make agreements and to conjugate verbs.

Resources derived from linguistic resources (mainly WordNet, lefff, german-pos-dict, morph-it) remain under their original licence.

Table 1. Used by all languages
Resource Usage Licence

 random-js

random numbers

MIT

 date-fns

dates and times formatting.

MIT

 numeral.js

numbers formatting

MIT

 n2words

Cardinal numbers in letters: 5 → five etc. (except for German)

MIT

snowball-stemmer.jsx

stemming

MIT

stopwords-de, stopwords-en, stopwords-es, stopwords-fr, stopwords-it

lists of stop words

MIT

wink-tokenizer

tokenizer

MIT

Table 2. English specific
Resource Usage Licence

WordNet

English gerunds (ing) & list of words or adjectives that must be preceded by an

WordNet licence

 better-title-case

title case (for titles) in English

MIT

number-to-words

ordinal numbers in English

MIT

Table 3. French specific
Resource Usage Licence

Aspirated h

French words that are 'aspiré' (vs. 'muet')

CC BY-SA 3.0

LEFFF - lexique des formes fléchies du français

gender and plural of French words

LGPLLR

 pluralize-fr

pluralize nouns

MIT

 titlecase-french

title case (for titles) in French

MIT

Table 4. German specific
Resource Usage Licence

German part-of-speech dictionary (german-pos-dict)

German adjectives, words and verbs agreement

CC-BY-SA-4.0 License

Table 5. Italian specific
Resource Usage Licence

Morph-it!

agreement of Italian adjectives, words and verbs

CC BY-SA 2.0

Table 6. Spanish specific
Resource Usage Licence

ordinal-spanish

ordinal numbers for Spanish

Apache 2.0

gender-es

gender of Spanish words

MIT

pluralize-es

plural of Spanish words

MIT

conjugator

Spanish verbs conjugation

Apache 2.0

Initial committers (name, email, organization) and how long have they been working on project

Have the project defined the roles of contributor, committer, maintainer, etc

Yes, see:

Total number of contributors to the project including their affiliations

Does the project have a release methodology

For JavaScript version (main), see Publish a new version:

For Java version, see Publish a new version:

Does the project have a code of conduct

Did the project achieve any of the CII best practices badges

Yes for both repos:

Do you have any specific infrastructure requests needed as part of hosting the project in the LF AI?

  • Github Actions

  • documentation is hosted on AWS (S3 + CloudFront)

Project website

Social media accounts

Existing sponsorship

Support:

  • Addventa (company specialized in NLG, based in Paris) provides commercial support on RosaeNLG (support with SLA and Professional Services)

  • RosaeNLG is available for commercial usage on Orange AI marketplace

  • Ongoing discussions with RedLab Paris to have junior PhDs as contributors

Early adopters: