Read Me!

Classifying Cancerous Genetic Mutation Variations

About

For this project, I used a variety of classification models to categorize the malignancy of genetic mutation variations based on data derived from the text of expertly annotated scientific literature. The ultimate ensemble classification model is based on exploration using Multinomial Naive Bayes, Logistic Regression, Linear SVMs, and Random Forests.

Data

Data come from a Kaggle Competition sponsored by Memorial Sloan Kettering Cancer Center. Using the contest's nomenclature, training data include expert-defined class identification (i.e., Class 1-9), and test data (with no class identification) are used to to for scoring the competition. Therefore, I split the provided training data for training and testing my models.

Training Variants, comma separated file containing information about the genetic mutations
- ID, id number of the row used to link the mutation to the clinical evidence''
- Gene, the gene where this genetic mutation is located
- Variation, the amino acid change for this mutation
- Class, the class this genetic mutation has been classified on (1-9; no descriptions are provided for class assignments)
Training Text, a double pipe (||) delimited file containing clinical evidence (text) used to classify genetic mutations
- ID, id number of the row used to link the clinical evidence to the genetic mutation
- Text, the clinical evidence (scientific literature) used to classify the genetic mutation

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Code		Code
Data		Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Read Me!

About

Data

About

Releases

Packages

Languages

actionsteve/cancer-classification

Folders and files

Latest commit

History

Repository files navigation

Read Me!

About

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages