Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSML Models Implementation #38

Closed
wants to merge 35 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
e10e632
adding hyperopt functions
Apr 22, 2022
bd0ab96
add supervised logistic regression model function
Apr 22, 2022
1afbcd6
adding cotraining model function
Apr 22, 2022
e3a5e62
adding code for Label Prop model function
Apr 22, 2022
12c46de
adding shadow fully connected NN model function
Apr 22, 2022
3cc5e95
adding shadow eaat cnn function model
Apr 22, 2022
15fede0
abstracting MINOS to Spectra
Apr 22, 2022
a9410da
removing duplicate device in eaat-cnn
Apr 22, 2022
d3e5068
revamping design of ssl models, starting with logreg
Jul 29, 2022
3126ebe
adding save function to logreg class and renaming hyperopt.py
Aug 4, 2022
edcc56e
commenting logistic regression class and methods
Aug 12, 2022
bf630f4
scripts/utils.py pep8 changes
Aug 12, 2022
fd824dd
implementing LabelProp with hyperopt functionality
Aug 12, 2022
0c3ae2a
implementing co-training with hyperopt functionality
Aug 12, 2022
42f19f4
implementing Shadow fully-connected NN with hyperopt
Aug 12, 2022
a629bb3
implementing Shadow EAAT CNN with hyperopt
Aug 12, 2022
ebe247a
adding functions for pca analysis
Aug 12, 2022
7ae4671
rearranging model files
Aug 15, 2022
6997a6d
adding unit test for LogReg
Aug 15, 2022
73ce1f1
updating dependencies
Aug 15, 2022
98e33e8
correcting pytorch package name
Aug 15, 2022
12982ca
adding unit test for CoTraining
Aug 15, 2022
1365e30
adding unit test for LabelProp
Aug 15, 2022
c97136d
adding unit test for ShadowNN
Aug 15, 2022
554eb05
including utils scripts in unit tests coverage
Aug 15, 2022
20f768e
error: training NNs takes too long for a unit test, let alone hyperopt
Aug 15, 2022
5d17d8c
error: these cnns are so bad that they can't even make predictions
Aug 15, 2022
80d1e9b
correcting cnn parameter calculation to include max_pool1d
Aug 16, 2022
95ee61b
adding tests for more coverage
Aug 16, 2022
49ed669
adding a test for util plots
Aug 16, 2022
3cb9b44
adding seed test to co-training
Aug 16, 2022
c131dcf
removing old commented line
Aug 22, 2022
4c53820
changing fresh_start methods of models to use class train method instead
Sep 29, 2022
f0bccf1
adding an EarlyStopper class for managing that functionality
Oct 7, 2022
a094a25
adding cross validation implementation
Oct 10, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
- name: Test with pytest
run: |
python3 -m pytest
python3 -m coverage run --source=./RadClass/ -m pytest
python3 -m coverage run --source=./RadClass/,./models/,./scripts/ -m pytest
python3 -m coverage report
python3 -m coverage html
COVERALLS_REPO_TOKEN=${{ secrets.COVERALLS_REPO_TOKEN }} python3 -m coveralls --service=github
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,13 @@ Versions 3.6-3.9 are currently supported by tests. The following Python packages
* h5py
* numpy
* progressbar2
* matplotlib
* seaborn
* scipy
* sklearn
* hyperopt
* torch
* shadow-ssml

Modules can be imported from the repository directory (e.g. `from RadClass.H0 import H0`) or `RadClass` can be installed using pip:

Expand Down
162 changes: 162 additions & 0 deletions models/LogReg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# For hyperopt (parameter optimization)
from hyperopt import STATUS_OK
# sklearn models
from sklearn import linear_model
# diagnostics
from sklearn.metrics import balanced_accuracy_score
from scripts.utils import run_hyperopt
import joblib


class LogReg:
'''
Methods for deploying sklearn's logistic regression
implementation with hyperparameter optimization.
Data agnostic (i.e. user supplied data inputs).
TODO: Currently only supports binary classification.
Add multinomial functions and unit tests.
Add functionality for regression(?)
Inputs:
params: dictionary of logistic regression input functions.
keys max_iter, tol, and C supported.
random_state: int/float for reproducible intiailization.
'''

# only binary so far
def __init__(self, params=None, random_state=0):
# defaults to a fixed value for reproducibility
self.random_state = random_state
# dictionary of parameters for logistic regression model
self.params = params
if self.params is None:
self.model = linear_model.LogisticRegression(
random_state=self.random_state
)
else:
self.model = linear_model.LogisticRegression(
random_state=self.random_state,
max_iter=params['max_iter'],
tol=params['tol'],
C=params['C']
)

def fresh_start(self, params, data_dict):
'''
Required method for hyperopt optimization.
Trains and tests a fresh logistic regression model
with given input parameters.
This method does not overwrite self.model (self.optimize() does).
Inputs:
params: dictionary of logistic regression input functions.
keys max_iter, tol, and C supported.
data_dict: compact data representation with the four requisite
data structures used for training and testing a model.
keys trainx, trainy, testx, and testy required.
'''

# unpack data
trainx = data_dict['trainx']
trainy = data_dict['trainy']
testx = data_dict['testx']
testy = data_dict['testy']

# supervised logistic regression
clf = LogReg(params=params, random_state=self.random_state)
# train and test model
clf.train(trainx, trainy)
# uses balanced_accuracy accounts for class imbalanced data
clf_pred, acc = clf.predict(testx, testy)

# loss function minimizes misclassification
return {'loss': 1-acc,
'status': STATUS_OK,
'model': clf.model,
'params': params,
'accuracy': acc}

def optimize(self, space, data_dict, max_evals=50, verbose=True):
'''
Wrapper method for using hyperopt (see utils.run_hyperopt
for more details). After hyperparameter optimization, results
are stored, the best model -overwrites- self.model, and the
best params -overwrite- self.params.
Inputs:
space: a hyperopt compliant dictionary with defined optimization
spaces. For example:
# quniform returns float, some parameters require int;
# use this to force int
space = {'max_iter': scope.int(hp.quniform('max_iter',
10,
10000,
10)),
'tol' : hp.loguniform('tol', 1e-5, 1e-1),
'C' : hp.uniform('C', 0.001,1000.0)
}
See hyperopt docs for more information.
data_dict: compact data representation with the four requisite
data structures used for training and testing a model.
keys trainx, trainy, testx, testy required.
max_evals: the number of epochs for hyperparameter optimization.
Each iteration is one set of hyperparameters trained
and tested on a fresh model. Convergence for simpler
models like logistic regression typically happens well
before 50 epochs, but can increase as more complex models,
more hyperparameters, and a larger hyperparameter space is tested.
verbose: boolean. If true, print results of hyperopt.
If false, print only the progress bar for optimization.
'''

best, worst = run_hyperopt(space=space,
model=self.fresh_start,
data_dict=data_dict,
max_evals=max_evals,
verbose=verbose)

# save the results of hyperparameter optimization
self.best = best
self.model = best['model']
self.params = best['params']
self.worst = worst

def train(self, trainx, trainy):
'''
Wrapper method for sklearn's logisitic regression training method.
Inputs:
trainx: nxm feature vector/matrix for training model.
trainy: nxk class label vector/matrix for training model.
'''

# supervised logistic regression
self.model.fit(trainx, trainy)

def predict(self, testx, testy=None):
'''
Wrapper method for sklearn's logistic regression predict method.
Inputs:
testx: nxm feature vector/matrix for testing model.
testy: nxk class label vector/matrix for training model.
optional: if included, the predicted classes -and-
the resulting classification accuracy will be returned.
'''

pred = self.model.predict(testx)

acc = None
if testy is not None:
# uses balanced_accuracy_score to account for class imbalance
acc = balanced_accuracy_score(testy, pred)

return pred, acc

def save(self, filename):
'''
Save class instance to file using joblib.
Inputs:
filename: string filename to save object to file under.
The file must be saved with extension .joblib.
Added to filename if not included as input.
'''

if filename[-7:] != '.joblib':
filename += '.joblib'
joblib.dump(self, filename)
Loading