GitHub Classifier by Maxim Schuwalow, Tobias Ludwig & Fabian Richter

Requirements

Training should only be done natively (not in docker) and GPU-accelerated. GPUs used in training should possess at least 3GB VRAM. As we already deliver a trained model, this should not be necessary. In order to start the server for inference, you should have at least 8GB of available RAM, as our server already requires 5GB to load word embeddings and our model into memory. Our docker image takes up around 10GB of space on disk.

We also need pretrained word embeddings; we used word2vec by Google. Download this from here and save the compressed file into the data directory.

Furthermore, our crawler needs a personal access token with access rights repo (full rights, not only public_repo because of GraphQL). Please insert this at the marked position in classification/__init.py__. GitHub does not allow publication of these.

Usage

Classification

We wrote a wrapper script utilizing our classification server for the text files in the given format. You need Python 3 and requests to use it. Before executing it, you need to start the classification server as described in the next parts.

pip install requests
./eval.py INPUT OUTPUT

Our classifaction of appendix B is in the file classified.

Installation via docker

We built a docker image to easily start up everything included. You can start it by executing run_demo.sh

./run_demo.sh

Manual installation

Alternatively, you could install our python server as a module and start the server that wraps our classifier: You would need python 3.5 for that.

pip install -e .
github-classify

Additionally, you need to extract all the files in data.

cd data
unzip *.zip
gunzip *.gz

You get raw classifications and repository metadata in JSON at

http://localhost:8081/rate/{username}/{reponame}

Then, there is our web frontend. You need node & npm for that.

cd webapp
npm install -g angular-cli
npm install
ng serve

Now, there is a webserver running at localhost:8080.

Documentation

Our documentation is included in the web frontend.

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
classification		classification
data		data
docs		docs
models		models
out		out
scripts		scripts
webapp		webapp
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
InformatiCup2017.pdf		InformatiCup2017.pdf
README.md		README.md
TODOs.org		TODOs.org
appendix-b-repositories		appendix-b-repositories
classified		classified
eval.py		eval.py
requirements.txt		requirements.txt
run.sh		run.sh
run_demo.sh		run_demo.sh
setup.py		setup.py
stop_demo.sh		stop_demo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Classifier by Maxim Schuwalow, Tobias Ludwig & Fabian Richter

Requirements

Usage

Classification

Installation via docker

Manual installation

Documentation

About

Releases

Packages

Contributors 4

Languages

toludwig/golden-lemurs

Folders and files

Latest commit

History

Repository files navigation

GitHub Classifier by Maxim Schuwalow, Tobias Ludwig & Fabian Richter

Requirements

Usage

Classification

Installation via docker

Manual installation

Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages