Skip to content

milden6/nlp-tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Tasks


1. First task

Parser of PDF/DOCX/TXT/HTML formats in XML

2. Second task

Development of a finite state machine for regular expression, implementation of two string comparison algorithms and analysis of their behavior using the example of a dendrogram of hierarchical clustering of words from an article

3. Third task

Making a stop-word dictionary and thematic dictionaries using TF-IDF and a contrast method

4. Fourth task

Co-co-occurrence matrix, PPMI and LSA matrix, cosine similarity and scalar similarity

5. Fifth task

Naive Bayes spam filtering

Test results:

Precision(spam): 0.9371069182389937 Recall(spam): 0.9802631578947368 F-score(spam): 0.9581993569131834.

6. Sixth task

Vectorization of words and making a list of top n most likely words for each topic and a list of top n most likely topics for each document based on the PLSA model (own implementation). And Word2Vec for word similarity

About

NLP tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages