Python Classifiers

Implementations of both N-Nearest Neighbour and Naive Bayes classifiers in Python, also includes some sample data. Some references to Weka are made - it can be downloaded here: http://www.cs.waikato.ac.nz/ml/index.html

Folders & Steps

1. Raw Data - Contains the raw pima indians diabetes .data file as well as the .name file as well as the assignment spec.
2. Processing - Contains everything needed to generate the processed .csv file:
- Run the preprocessor.py file to add a header, change the class names and remove invalid results (see Assumptions and Invalid Data).
- Open the resulting pima_processed.csv file in Weka and go Filter > Choose > Attribute > Normalise > Apply and then save the file as pima.csv to get a normalised CSV file.
3. Classifiers - contains python scripts to run the classifiers:
- classifier.py
  - Run classifier.py -h for more information about argument usage.
  - A log file classifier.log will be created which logs information about the run such as number of correctly and incorrectly identified instances.
  - --folds, -f number of folds you want to split the data into (default 10)
  - --neighbours, -k, number of nearest neighbours (default 3)
  - --algorithm, -a, the algorithm to run (KNN/NB)
4. Feature Selection - contains a version of the data that has been run through Weka's CFS feature selection:
- Open the pima.csv file generated in step 2 in Weka and go Select Attributes > Start > Right click the result > Save Reduced data...
- Some header information that Weka generates in puts in the file had to be removed manually.
5. Results - contains a results spreadsheet and the final report.
- results_matrix.xlsx - Compares our classifiers to Weka's for both the data with no feature selection and the data with feature selection (this is also included in report.pdf)
- report.pdf - Contains findings.

Assumptions and Invalid Data

There are a number of fields in the data where attributes are missing and have been coded as 0. We have decided to remove the rows containing a 0 value in the following fields:
- Glucose Concentration
- Blood Pressure
- Tricep Skin
- Body Mass Index (BMI)

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
1. raw data		1. raw data
2. processing		2. processing
3. classifiers		3. classifiers
4. feature selection		4. feature selection
5. results		5. results
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Classifiers

Folders & Steps

Assumptions and Invalid Data

About

Uh oh!

Releases

Packages

Languages

samturner/python-classifiers

Folders and files

Latest commit

History

Repository files navigation

Python Classifiers

Folders & Steps

Assumptions and Invalid Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages