Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interactive t-sne and pca #395

Merged
merged 10 commits into from
Nov 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,8 @@ Browse the repo:
About the Project
=================

Aliro is actively developed by the [Institute for Biomedical Informatics](http://upibi.org) at the University of Pennsylvania.
Contributors include Heather Williams, Weixuan Fu, William La Cava, Josh Cohen,
Steve Vitale, Sharon Tartarone, Randal Olson, Patryk Orzechowski, and Jason Moore.
Aliro is actively developed by the Center for Artificial Intelligence Research (CAIR) in the [Department of Computational Biomedicine](https://www.cedars-sinai.edu/research/departments-institutes/computational-biomedicine.html) at [Cedars-Sinai Medical Center](https://www.cedars-sinai.org/) in Los Angeles.
Contributors include Hyunjun Choi, Miguel Hernandez, Nick Matsumoto, Jay Moran, Paul Wang, and Jason Moore (PI).

Cite
====
Expand Down
72 changes: 71 additions & 1 deletion ai/sklearn/config/classifiers.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
classifier_config_dict = {

# Original six classifiers
'sklearn.tree.DecisionTreeClassifier': {
'params': {
'criterion': ["gini", "entropy"],
Expand Down Expand Up @@ -75,5 +76,74 @@
'bootstrap': [True, False],
'min_weight_fraction_leaf': [0.0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45]
}
}
},





# new classifiers
# 'sklearn.ensemble.AdaBoostClassifier': {
# 'params': {
# 'n_estimators': [100, 500],
# 'learning_rate': [0.01, 0.1, 1],
# 'algorithm': ["SAMME", "SAMME.R"]
# }
# },


# 'sklearn.cluster.KMeans': {
# 'params': {
# 'n_clusters': [2, 3, 4, 5, 6, 7, 8, 9, 10],
# 'init': ["k-means++", "random"],
# 'n_init': [10, 20, 30],
# 'max_iter': [100, 200, 300, 400, 500],
# 'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1]
# }
# },

# 'sklearn.naive_bayes.GaussianNB': {
# 'params': {
# 'var_smoothing': [1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1]
# }
# },

# 'sklearn.naive_bayes.MultinomialNB': {
# 'params': {
# 'alpha': [0.0, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
# 'fit_prior': [True, False]
# }
# },

# 'sklearn.naive_bayes.BernoulliNB': {
# 'params': {
# 'alpha': [0.0, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
# 'fit_prior': [True, False]
# }
# },

# 'sklearn.neural_network.MLPClassifier': {
# 'params': {
# 'hidden_layer_sizes': [(100,), (100, 100), (100, 100, 100)],
# 'activation': ["identity", "logistic", "tanh", "relu"],
# 'solver': ["lbfgs", "sgd", "adam"],
# 'alpha': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
# 'learning_rate': ["constant", "invscaling", "adaptive"],
# 'learning_rate_init': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
# 'power_t': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
# 'max_iter': [100, 500, 1000, 2000, 5000, 10000],
# 'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1],
# 'momentum': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
# 'nesterovs_momentum': [True, False],
# 'early_stopping': [True, False],
# 'beta_1': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
# 'beta_2': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
# 'epsilon': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1],
# 'validation_fraction': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
# 'n_iter_no_change': [5, 10, 20, 50, 100]
# }
# }



}
2 changes: 1 addition & 1 deletion config/common.env
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ MACHINE_HOST=machine
MACHINE_CONFIG=/appsrc/config/machine_config.json
MACHINE_SHAP_SAMPLES_KERNEL_EXPLAINER=50
MACHINE_SHAP_SAMPLES_OTHER_EXPLAINER=100
EXP_TIMEOUT=10
EXP_TIMEOUT=100
DT_MAX_DEPTH=6

STARTUP_DATASET_PATH=/appsrc/data/datasets/user
14 changes: 13 additions & 1 deletion config/machine_config.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@



{
"algorithms": ["DecisionTreeClassifier",
"GradientBoostingClassifier",
Expand All @@ -10,5 +13,14 @@
"SVR",
"KNeighborsRegressor",
"KernelRidge",
"RandomForestRegressor"]
"RandomForestRegressor",
"AdaBoostClassifier"
,"KMeans"
,"GaussianNB"
,"MultinomialNB"
,"BernoulliNB"
,"MLPClassifier"
]
}


9 changes: 9 additions & 0 deletions data/datasets/pmlb_small/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Benchmark data sets

This directory contains over 150 data sets for benchmarking supervised machine learning algorithms.

Each subdirectory corresponds to a separate data set, and will have a README file providing some basic information about the data set.

# High-level summary of data sets

[in progress]
80 changes: 80 additions & 0 deletions data/datasets/pmlb_small/allbp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# allbp

## Summary Stats

#instances: 3772

#features: 29

#binary_features: 21

#integer_features: 8

#float_features: 0

Endpoint type: integer

#Classes: 3

Imbalance metric: 0.8755228428707819

## Feature Types

age:discrete

sex:discrete

on thyroxine:binary

query on thyroxine:binary

on antithyroid medication:binary

sick:binary

pregnant:binary

thyroid surgery:binary

I131 treatment:binary

query hypothyroid:binary

query hyperthyroid:binary

lithium:binary

goitre:binary

tumor:binary

hypopituitary:binary

psych:binary

TSH measured:binary

TSH:discrete

T3 measured:binary

T3:discrete

TT4 measured:binary

TT4:discrete

T4U measured:binary

T4U:discrete

FTI measured:binary

FTI:discrete

TBG measured:binary

TBG:binary

referral source:discrete

Loading