WikiAutoEncoder

Take N classes of wikipedia articles (laziest way possible)
- for each class - 1000 articles
Create an autoencoder to compress the articles.
Perform classification with typical classifier.
Discussion:
- Compare to a classification on plain text.
- Compare to PCA

Crawling Wikipedia:
- mediawiki API.
- requests python package.
Word / Article Representation:
- stopwords removal from nltk.
- co-occurance probability with a default context size of 6 for each token.
- article representation through simple sum of co-occurance probability for each token.
Compression:
- AutoEncoder:
  - a simple 4 layers NN (implemented with tensorflow).
- PCA:
  - sklearn implementation of PCA.
Classification:
- LogisticRegression with sklearn pipeline.
- RandomForest with sklearn pipeline.
Visualization:
- Visualization using matplotlib.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
classification		classification
saved_states		saved_states
wiki_dataloader		wiki_dataloader
word_embedding		word_embedding
.gitignore		.gitignore
Presentation.ipynb		Presentation.ipynb
README.md		README.md
main.py		main.py

Provide feedback