Skip to content

Predicting hypertension using classification algorithms and basic visualization of results using d3

Notifications You must be signed in to change notification settings

stephanieleevillanueva/classification_and_d3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

classification_and_d3

Predicting hypertension using classification algorithms and basic visualization of results using d3

####Folder structure:

d3
contains .html files and .csv data files used to create the d3 bar graphs.
* accuracy.html html file with css and javascript. Uses ``d3.js`` (with tooltip) library to generate the graph * pivot_accuracy.csv dataset used by accuracy.html * recall.html html file with css and javascript. Uses ``d3.js`` (with tooltip) library to generate the graph * pivot_recall.csv dataset used by recall.html * feature_importance.html html file with css and javascript. Uses ``d3.js`` library and also uses transition to generate the graph
sql
contains .sql scripts to generate final datasets used in classification models.
* sql_tables.sql creates table schemas * final_tables.sql selects only columns needed from ``sql_tables.sql`` and creates new filtered ``.sql`` tables * script_raw.sql joins tables from ``final_tables.sql`` and generates a single ``.sql`` table with data in its original form to be used for analysis. * script_converted.sql takes ``script_raw.sql`` generated table and converts selected columns into binary form.
data
contains .dat files relating to hypertension, pulled from www.cdc.gov website for year 2011-2012. This dataset is used in classifying people at risk for hypertension.
py_and_ipynb_files
contains .py helper files for database connection using python and .ipynb notebooks for classification modeling and results analysis
* pass_.py stores user's password for database * postgresql_conn.py defines ``sqlalchemy`` engine connection parameters * cdc_data.ipynb populates database tables with data from the ``.dat`` files * cdc_analysis.ipynb splits dataset into training and test data to build classification models and measure model metrics such as ``accuracy``, ``precision``, ``f1`` and ``roc curve``. Also uses sklearn feature called ``rfe`` or recursive feature elimination to reduce number of dimensions used in building the models.

About

Predicting hypertension using classification algorithms and basic visualization of results using d3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published