chore(nlp): Czech tokenizer, stemmer and stopwords added #1113

elozano98 · 2020-11-19T10:12:59Z

Depends on #1110. Please review it first. ⚠️

Description

Czech tokenizer, stemmer, and stopwords have been added to contentful nlp.

Adding them will make it possible to process Czech text.

The tokenizer and the stemmer used are from the nlpjs library while the stopwords have been collected from a github repository.

The pull request...

elozano98 requested review from dpinol and vanbasten17 November 19, 2020 10:13

vanbasten17 approved these changes Nov 19, 2020

View reviewed changes

elozano98 force-pushed the contentful/el branch from 4f37d1c to 3bbaa12 Compare November 19, 2020 11:07

elozano98 added 2 commits November 19, 2020 12:34

chore(nlp): Greek tokenizer, stemmer and stopwords added

acc0f9c

chore(nlp): Czech tokenizer, stemmer and stopwords added

9ce8293

elozano98 force-pushed the contentful/cs branch from 8138d47 to 9ce8293 Compare November 19, 2020 11:40

fix(nlp): duplicated el normalizer test removed.

beacab0

elozano98 mentioned this pull request Nov 19, 2020

chore(nlp): Ukrainian tokenizer, stemmer and stopwords added #1114

Merged

Base automatically changed from contentful/el to master November 19, 2020 14:35

dpinol approved these changes Nov 19, 2020

View reviewed changes

elozano98 merged commit e057e23 into master Nov 19, 2020

elozano98 deleted the contentful/cs branch November 19, 2020 14:39