NLP-IN-PRACTICE

Use these NLP, Text Mining and Machine Learning code samples and tools to solve real world text data problems.

Notebooks / Source

Links in the first column take you to the subfolder/repository with the source code.

Task	Related Article	Source Type	Description
Large Scale Phrase Extraction	phrase2vec article	python script	Extract phrases for large amounts of data using PySpark. Annotate text using these phrases or use the phrases for other downstream tasks.
Word Cloud for Jupyter Notebook and Python Web Apps	word_cloud article	python script + notebook	Visualize top keywords using word counts or tfidf
Gensim Word2Vec (with dataset)	word2vec article	notebook	How to work correctly with Word2Vec to get desired results
Reading files and word count with Spark	spark article	python script	How to read files of different formats using PySpark with a word count example
Extracting Keywords with TF-IDF and SKLearn (with dataset)	tfidf article	notebook	How to extract interesting keywords from text using TF-IDF and Python's SKLEARN
Text Preprocessing	text preprocessing article	notebook	A few code snippets on how to perform text preprocessing. Includes stemming, noise removal, lemmatization and stop word removal.
TFIDFTransformer vs. TFIDFVectorizer	tfidftransformer and tfidfvectorizer usage article	notebook	How to use TFIDFTransformer and TFIDFVectorizer correctly and the difference between the two and what to use when.
Accessing Pre-trained Word Embeddings with Gensim	Pre-trained word embeddings article	notebook	How to access pre-trained GloVe and Word2Vec Embeddings using Gensim and an example of how these embeddings can be leveraged for text similarity
Text Classification in Python (with news dataset)	Text classification with Logistic Regression article	notebook	Get started with text classification. Learn how to build and evaluate a text classifier for news classification using Logistic Regression.
CountVectorizer Usage Examples	How to Correctly Use CountVectorizer? An In-Depth Look article	notebook	Learn how to maximize the use of CountVectorizer such that you are not just computing counts of words, but also preprocessing your text data appropriately as well as extracting additional features from your text dataset.
HashingVectorizer Examples	HashingVectorizer Vs. CountVectorizer article	notebook	Learn the differences between HashingVectorizer and CountVectorizer and when to use which.
CBOW vs. SkipGram	Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI article	notebook	A quick comparison of the three embeddings architecture.

Notes

For more articles, please see this list.
If you would like to receive articles via email subscribe to my mailing list.

Contact

This repository is maintained by Kavita Ganesan. Connect with me on LinkedIn or Twitter.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
CountVectorizer		CountVectorizer
cbow_skipgram_subword		cbow_skipgram_subword
hashingvectorizer		hashingvectorizer
pre-trained-embeddings		pre-trained-embeddings
spark_wordcount		spark_wordcount
text-classification		text-classification
text-pre-processing		text-pre-processing
tf-idf		tf-idf
tfidftransformer		tfidftransformer
word2vec		word2vec
.gitattributes		.gitattributes
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-IN-PRACTICE

Notebooks / Source

Notes

Contact

About

Releases

Packages

Contributors 2

Languages

kavgan/nlp-in-practice

Folders and files

Latest commit

History

Repository files navigation

NLP-IN-PRACTICE

Notebooks / Source

Notes

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages