Skip to content

h-pal/Predicting-the-Programming-Language-of-StackOverflow-Questions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting-the-Programming-Language-of-StackOverflow-Questions-using-Natural-Language-Processing

I used a LSTM based Deep learning model to predict the 16 most popular programming languages of 2019. I considered the title and body of a question to make prediction. I used a three-layer LSTM network as LSTMs are so effective for the sequential dataset and are widely preferred in NLP domain.

Data

I used StackSample dataset which is a collection of more than 1 million StackOverflow questions, answers and tags.

Requirements

Python 3.7

Keras 2.3

Numpy 1.18

Pandas 1.0

Matplotlib 3.1

NLTK 3.4.5

Re 2.2.1

Result

When tested on an unseen test set, I was able to achieve an accuracy of 82.34%.

Confusion Matrix

References

Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing, Kamel Alrashedy

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published