Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting point code for MDR implementation #1

Merged
merged 4 commits into from
Jun 15, 2016

Conversation

TuanNguyen27
Copy link
Contributor

Assumption(s): Labels are only binary

Implemented:

fit(self, features, classes): simply build a dictionary that maps each
instance of the feature vector to a tuple. The tuple keeps count of how
many times a particular label value appears with that instance of
feature vector. Key: tuple of feature values - Value: tuple of label
frequency/label counts

transform(self, features): After the dictionary is completed, combine
each instance of feature vector above into one corresponding label that
has the frequency ratio greater than its standard default ratio.

score(self, features, classes): Compare the new combined feature vector
with its corresponding class labels, and count the times the two match.
Output the average accuracy by averaging the match count over the
length of the new feature vector / classes vector.

Implementation is tested in main() by training MDR on the training set
and getting accuracy_score on the test set.

Assumption(s): Labels are only binary

Implemented:

fit(self, features, classes): simply build a dictionary that maps each
instance of the feature vector to a tuple. The tuple keeps count of how
many times a particular label value appears with that instance of
feature vector. Key: tuple of feature values - Value: tuple of label
frequency/label counts

transform(self, features): After the dictionary is completed, combine
each instance of feature vector above into one corresponding label that
has the frequency ratio greater than its standard default ratio.

score(self, features, classes): Compare the new combined feature vector
with its corresponding class labels, and count the times the two match.
Output the average accuracy by averaging the match count over the
length of the new feature vector / classes vector.

Implementation is tested in main() by training MDR on the training set
and getting accuracy_score on the test set.
description
tie_break: type int (default: 0)
description: specify the default label in case there's a tie in a given set of feature values
default_label: type int (default: 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the words "type"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops got cha!

Changed fdict to feature_map
Removed ‘type’ in line 34 & 36
Fixed all bugs according to Randal Olson’s comments.
@@ -18,29 +18,33 @@
"""

import pandas as pd

import numpy as np
from collections import defaultdict
from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from __future__ import print_function must be the first import in the file.

second fix according to comments
@rhiever rhiever merged commit 094efd3 into EpistasisLab:master Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants