GitHub - milden6/nlp-tasks: NLP tasks

NLP Tasks

1. First task

Parser of PDF/DOCX/TXT/HTML formats in XML

2. Second task

Development of a finite state machine for regular expression, implementation of two string comparison algorithms and analysis of their behavior using the example of a dendrogram of hierarchical clustering of words from an article

3. Third task

Making a stop-word dictionary and thematic dictionaries using TF-IDF and a contrast method

4. Fourth task

Co-co-occurrence matrix, PPMI and LSA matrix, cosine similarity and scalar similarity

5. Fifth task

Naive Bayes spam filtering

Test results:

Precision(spam): 0.9371069182389937 Recall(spam): 0.9802631578947368 F-score(spam): 0.9581993569131834.

6. Sixth task

Vectorization of words and making a list of top n most likely words for each topic and a list of top n most likely topics for each document based on the PLSA model (own implementation). And Word2Vec for word similarity

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
task_1		task_1
task_2		task_2
task_3		task_3
task_4		task_4
task_5		task_5
task_6		task_6
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP Tasks

1. First task

2. Second task

3. Third task

4. Fourth task

5. Fifth task

6. Sixth task

About

Uh oh!

Releases

Packages

Languages

milden6/nlp-tasks

Folders and files

Latest commit

History

Repository files navigation

NLP Tasks

1. First task

2. Second task

3. Third task

4. Fourth task

5. Fifth task

6. Sixth task

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages