Natural language processing webservice for polish language
The webservice was designed to enrich information about texts using nlp processing. For this purpuse we built a pipeline with the following steps:
- Tokenization - to split text into sentences and words
- Morphological analysis - to define all grammar possibilities of a given word
- POS Tagging - to disambiguate grammar categories of a given word
Our tool uses:
- grammar categories developed in nkjp project. All grammar categories can be found in the following book
- lexicon for morphological analysis developed in Applica company based on polimorf
- Url: https://nlp.applica.pl/ams-ws-nlp/rest/nlp/simple
- Input [application/json]:
{"message":{"body":"Tekst do przetworzenia."},"token":"applica_token"}
- Output [plain/text]:
tekst do przetworzyć .
- Curl:
curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"message":{"body":"Tekst do przetworzenia."},"token":"applica_token"}' "https://nlp.applica.pl/ams-ws-nlp/rest/nlp/simple"
- Url: https://nlp.applica.pl/ams-ws-nlp/rest/nlp/extended
- Input [application/json]:
{"message":{"body":"Tekst do przetworzenia."},"token":"applica_token"}
- Output [plain/text]:
{"sentIdx":[1,1,1,1],"base":["tekst","do","przetworzyć","."],"cTag":["subst","prep","ger","interp"],"nps":[true,false,false,true],"orth":["Tekst","do","przetworzenia","."]}
- Output details: json contains details for processed text in a format of dictionary
orth
- list of words in original formbase
- list of base forms (lemma) for each wordcTag
- list of grammar categories for each wordnps
- list of flags for each word that inform, if in an original text was space before a wordsentIdx
- list of sentence indexes for each word- Curl:
curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"message":{"body":"Tekst do przetworzenia."},"token":"applica_token"}' "https://nlp.applica.pl/ams-ws-nlp/rest/nlp/extended"
Write an email to applica in case of:
- troubleshooting
- obtain applica_token