Homework4

Usage

Jupyter lab, Library: request, pandas, json, numpy, seaborn

Description

Here we are going to see if there is a correlation between mortality of COVID-19 and number of hospital beds available in the U.S.. We are going to use API to access the data and join the two sets of data to plot.

Instruction

Here we are going to do some exercise on getting Covid-19 data by using API, and we will make a plot with seaborn function within Jupyter notebook at the end.

import requests
import pandas as pd
import json
import numpy as np

req=requests.get("https://api.covid19api.com/live/country/united-states").json()
covid=pd.DataFrame(req)

covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24480 entries, 0 to 24479
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   ID           24480 non-null  object
 1   Country      24480 non-null  object
 2   CountryCode  24480 non-null  object
 3   Province     24480 non-null  object
 4   City         24480 non-null  object
 5   CityCode     24480 non-null  object
 6   Lat          24480 non-null  object
 7   Lon          24480 non-null  object
 8   Confirmed    24480 non-null  int64 
 9   Deaths       24480 non-null  int64 
 10  Recovered    24480 non-null  int64 
 11  Active       24480 non-null  int64 
 12  Date         24480 non-null  object
dtypes: int64(4), object(9)
memory usage: 2.4+ MB

covid.describe()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Confirmed	Deaths	Recovered	Active
count	2.448000e+04	24480.000000	24480.0	2.448000e+04
mean	1.169896e+06	15378.652002	0.0	1.154518e+06
std	1.522009e+06	18818.213672	0.0	1.504000e+06
min	0.000000e+00	0.000000	0.0	0.000000e+00
25%	2.508262e+05	2670.000000	0.0	2.485832e+05
50%	7.225890e+05	9371.000000	0.0	7.122190e+05
75%	1.468140e+06	19855.000000	0.0	1.450549e+06
max	1.116657e+07	95311.000000	0.0	1.107126e+07

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
sns.relplot(data=covid,x="Lat", y="Lon", hue="Deaths", size="Confirmed", sizes=(40, 400))

<seaborn.axisgrid.FacetGrid at 0x7ff53588d670>

Here we have an issue with values on x- and y-axix so we are going to check the value first.

covid['Lat'][0]

'18.35'

Here we are going to make a change data type of "Lat" and "Lon".

covid['Lat'] = covid['Lat'].astype('float16')
covid['Lon'] = covid['Lon'].astype('float16')

covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24480 entries, 0 to 24479
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   ID           24480 non-null  object 
 1   Country      24480 non-null  object 
 2   CountryCode  24480 non-null  object 
 3   Province     24480 non-null  object 
 4   City         24480 non-null  object 
 5   CityCode     24480 non-null  object 
 6   Lat          24480 non-null  float16
 7   Lon          24480 non-null  float16
 8   Confirmed    24480 non-null  int64  
 9   Deaths       24480 non-null  int64  
 10  Recovered    24480 non-null  int64  
 11  Active       24480 non-null  int64  
 12  Date         24480 non-null  object 
dtypes: float16(2), int64(4), object(7)
memory usage: 2.1+ MB

Now we can see that the data type of "Lat" and "Lon" has changed. Now we can make the plot correctly.

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
fig=sns.relplot(data=covid,x="Lon", y="Lat", hue="Deaths", size="Confirmed", sizes=(40, 400))

fig=sns.relplot(data=covid,x="Lon", y="Lat", hue="Deaths", size="Confirmed", sizes=(40, 400)).set(ylim=(25,50),xlim=(-150,-50))

Dataset COVID

Now we are going to get another data from different source. We are going to merge these two data by using pandas, and we are going to make a plot with the merged values in the merged data with seaborn

req=requests.get("https://www.communitybenefitinsight.org/api/get_hospitals.php").json()
hospitals=pd.DataFrame(req)
hospitals.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	hospital_id	hospital_org_id	ein	name	name_cr	street_address	city	state	zip_code	fips_state_and_county_code	hospital_bed_count	chrch_affl_f	urban_location_f	children_hospital_f	memb_counc_teach_hosps_f	medicare_provider_number	county	hospital_bed_size	updated_dt
0	1	1	630307951	Mizell Memorial Hospital	Mizell Memorial Hospital	702 Main Street	Opp	AL	36462	01039	99	N	N	N	N	010007	Covington County	<100 beds	June 15, 2022
1	2	2	630578923	St Vincents East	St Vincents East	50 Medical Park Drive East	Birmingham	AL	35235	01073	362	N	Y	N	Y	010011	Jefferson County	>299 beds	June 15, 2022
2	3	3	630312913	Shelby Baptist Medical Center	Shelby Baptist Medical Center	1000 First Street North	Alabaster	AL	35007	01117	252	N	Y	N	N	010016	Shelby County	100-299 beds	June 15, 2022
3	4	4	630459034	Callahan Eye Foundation Hosp	Callahan Eye Foundation Hosp	1720 University Boulevard	Birmingham	AL	35233	01073	106	N	Y	N	Y	010018	Jefferson County	100-299 beds	June 15, 2022
4	5	5	581973570	Cherokee Medical Center	Cherokee Medical Center	400 Northwood Drive	Centre	AL	35960	01019	60	N	N	N	N	010022	Cherokee County	<100 beds	June 15, 2022

The count is not unique because there are same name all over the states in the U.S.. Therefore, we are going to create a new column called "state_county".

hospitals['state_county']=hospitals['state']+'_'+hospitals['county']
hospitals.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	hospital_id	hospital_org_id	ein	name	name_cr	street_address	city	state	zip_code	fips_state_and_county_code	hospital_bed_count	chrch_affl_f	urban_location_f	children_hospital_f	memb_counc_teach_hosps_f	medicare_provider_number	county	hospital_bed_size	updated_dt	state_county
0	1	1	630307951	Mizell Memorial Hospital	Mizell Memorial Hospital	702 Main Street	Opp	AL	36462	01039	99	N	N	N	N	010007	Covington County	<100 beds	June 15, 2022	AL_Covington County
1	2	2	630578923	St Vincents East	St Vincents East	50 Medical Park Drive East	Birmingham	AL	35235	01073	362	N	Y	N	Y	010011	Jefferson County	>299 beds	June 15, 2022	AL_Jefferson County
2	3	3	630312913	Shelby Baptist Medical Center	Shelby Baptist Medical Center	1000 First Street North	Alabaster	AL	35007	01117	252	N	Y	N	N	010016	Shelby County	100-299 beds	June 15, 2022	AL_Shelby County
3	4	4	630459034	Callahan Eye Foundation Hosp	Callahan Eye Foundation Hosp	1720 University Boulevard	Birmingham	AL	35233	01073	106	N	Y	N	Y	010018	Jefferson County	100-299 beds	June 15, 2022	AL_Jefferson County
4	5	5	581973570	Cherokee Medical Center	Cherokee Medical Center	400 Northwood Drive	Centre	AL	35960	01019	60	N	N	N	N	010022	Cherokee County	<100 beds	June 15, 2022	AL_Cherokee County

HospitalInfoPerCounty=hospitals.groupby(['state_county'], as_index=False)['hospital_bed_count'].agg('sum')
HospitalInfoPerCounty.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	state_county	hospital_bed_count
0	AK_Anchorage Municipality	401
1	AK_Fairbanks North Star Borough	162
2	AK_Kenai Peninsula Borough	49622
3	AK_Ketchikan Gateway Borough	25
4	AK_Kodiak Island Borough	25

the number we have in AK_Kenai Peninsula Borough is too high comparing to the others. We need to check the data type of it.

hospitals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3423 entries, 0 to 3422
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   hospital_id                 3423 non-null   object
 1   hospital_org_id             3423 non-null   object
 2   ein                         3423 non-null   object
 3   name                        3423 non-null   object
 4   name_cr                     3423 non-null   object
 5   street_address              3423 non-null   object
 6   city                        3423 non-null   object
 7   state                       3423 non-null   object
 8   zip_code                    3423 non-null   object
 9   fips_state_and_county_code  3423 non-null   object
 10  hospital_bed_count          3423 non-null   object
 11  chrch_affl_f                3423 non-null   object
 12  urban_location_f            3423 non-null   object
 13  children_hospital_f         3423 non-null   object
 14  memb_counc_teach_hosps_f    3423 non-null   object
 15  medicare_provider_number    3423 non-null   object
 16  county                      3422 non-null   object
 17  hospital_bed_size           3423 non-null   object
 18  updated_dt                  3423 non-null   object
 19  state_county                3422 non-null   object
dtypes: object(20)
memory usage: 535.0+ KB

Here we are going to make to change its data type to numbers.

hospitals['hospital_bed_count'] = hospitals['hospital_bed_count'].astype('int64')

hospitals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3423 entries, 0 to 3422
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   hospital_id                 3423 non-null   object
 1   hospital_org_id             3423 non-null   object
 2   ein                         3423 non-null   object
 3   name                        3423 non-null   object
 4   name_cr                     3423 non-null   object
 5   street_address              3423 non-null   object
 6   city                        3423 non-null   object
 7   state                       3423 non-null   object
 8   zip_code                    3423 non-null   object
 9   fips_state_and_county_code  3423 non-null   object
 10  hospital_bed_count          3423 non-null   int64 
 11  chrch_affl_f                3423 non-null   object
 12  urban_location_f            3423 non-null   object
 13  children_hospital_f         3423 non-null   object
 14  memb_counc_teach_hosps_f    3423 non-null   object
 15  medicare_provider_number    3423 non-null   object
 16  county                      3422 non-null   object
 17  hospital_bed_size           3423 non-null   object
 18  updated_dt                  3423 non-null   object
 19  state_county                3422 non-null   object
dtypes: int64(1), object(19)
memory usage: 535.0+ KB

We are going to create a data frame that is unique by "state_county", and we are going to sum across entries on "hospital_bed_count".

HospitalInfoPerCounty=hospitals.groupby(['state_county'], as_index=False)['hospital_bed_count'].agg('sum')
HospitalInfoPerCounty.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	state_county	hospital_bed_count
0	AK_Anchorage Municipality	401
1	AK_Fairbanks North Star Borough	162
2	AK_Kenai Peninsula Borough	77
3	AK_Ketchikan Gateway Borough	25
4	AK_Kodiak Island Borough	25

HospitalInfoPerCounty['state_county']=HospitalInfoPerCounty['state_county'].str.replace(' County','')
HospitalInfoPerCounty.head(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	state_county	hospital_bed_count
0	AK_Anchorage Municipality	401
1	AK_Fairbanks North Star Borough	162
2	AK_Kenai Peninsula Borough	77
3	AK_Ketchikan Gateway Borough	25
4	AK_Kodiak Island Borough	25

COVID DATAFRAME

Here we are going to work with Covid data in order to compare with "HospitalInfoPerCounty"

import requests
import pandas as pd
import json
import numpy as np
req=requests.get("https://webhooks.mongodb-stitch.com/api/client/v2.0/app/covid-19-qppza/service/REST-API/incoming_webhook/us_only?min_date=2021-01-15T00:00:00.000Z&max_date=2021-01-15T00:00:00.000Z").json()
covid=pd.DataFrame(req)

covid.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	_id	uid	country_iso2	country_iso3	country_code	fips	county	state	country	combined_name	population	loc	date	confirmed	deaths	confirmed_daily	deaths_daily
0	6335197fb2fc4a05475fedec	84001001	US	USA	840	1001.0	Autauga	Alabama	US	Autauga, Alabama, US	55869.0	{'type': 'Point', 'coordinates': [-86.6441, 32...	2021-01-15T00:00:00.000Z	5103	55	28	0
1	6335197fb2fc4a05475ff1c0	84001003	US	USA	840	1003.0	Baldwin	Alabama	US	Baldwin, Alabama, US	223234.0	{'type': 'Point', 'coordinates': [-87.7221, 30...	2021-01-15T00:00:00.000Z	16002	179	161	2
2	6335197fb2fc4a05475ff594	84001005	US	USA	840	1005.0	Barbour	Alabama	US	Barbour, Alabama, US	24686.0	{'type': 'Point', 'coordinates': [-85.3871, 31...	2021-01-15T00:00:00.000Z	1712	36	16	0
3	6335197fb2fc4a05475ff968	84001007	US	USA	840	1007.0	Bibb	Alabama	US	Bibb, Alabama, US	22394.0	{'type': 'Point', 'coordinates': [-87.1251, 32...	2021-01-15T00:00:00.000Z	2130	47	17	0
4	6335197fb2fc4a05475ffd3c	84001009	US	USA	840	1009.0	Blount	Alabama	US	Blount, Alabama, US	57826.0	{'type': 'Point', 'coordinates': [-86.5679, 33...	2021-01-15T00:00:00.000Z	5264	83	45	3

We are going to convert full states name under "state" column to two letter code. In order to do that, we are going to make a library and replace states name by using the library. Any state name matches with its key in the library replace to values in the library.

us_state_to_abbrev = {
"Alabama": "AL","Alaska": "AK","Arizona": "AZ","Arkansas": "AR", "California": "CA",
"Colorado": "CO","Connecticut": "CT","Delaware": "DE","Florida": "FL","Georgia": "GA",
"Hawaii": "HI","Idaho": "ID","Illinois": "IL","Indiana": "IN","Iowa": "IA","Kansas": "KS",
"Kentucky": "KY","Louisiana": "LA","Maine": "ME","Maryland": "MD","Massachusetts": "MA",
"Michigan": "MI","Minnesota": "MN","Mississippi": "MS","Missouri": "MO","Montana": "MT",
"Nebraska": "NE","Nevada": "NV","New Hampshire": "NH","New Jersey": "NJ","New Mexico": "NM",
"New York": "NY","North Carolina": "NC","North Dakota": "ND","Ohio": "OH",
"Oklahoma": "OK","Oregon": "OR","Pennsylvania": "PA","Rhode Island": "RI",
"South Carolina": "SC","South Dakota": "SD","Tennessee": "TN","Texas": "TX",
"Utah": "UT","Vermont": "VT","Virginia": "VA","Washington": "WA",
"West Virginia": "WV","Wisconsin": "WI","Wyoming": "WY",
"District of Columbia": "DC","American Samoa": "AS","Guam": "GU", "Northern Mariana Islands": "MP",
"Puerto Rico": "PR","United States Minor Outlying Islands": "UM","U.S. Virgin Islands": "VI",
}
covid['state'].replace(us_state_to_abbrev, inplace=True)
covid.head(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	_id	uid	country_iso2	country_iso3	country_code	fips	county	state	country	combined_name	population	loc	date	confirmed	deaths	confirmed_daily	deaths_daily
0	6335197fb2fc4a05475fedec	84001001	US	USA	840	1001.0	Autauga	AL	US	Autauga, Alabama, US	55869.0	{'type': 'Point', 'coordinates': [-86.6441, 32...	2021-01-15T00:00:00.000Z	5103	55	28	0
1	6335197fb2fc4a05475ff1c0	84001003	US	USA	840	1003.0	Baldwin	AL	US	Baldwin, Alabama, US	223234.0	{'type': 'Point', 'coordinates': [-87.7221, 30...	2021-01-15T00:00:00.000Z	16002	179	161	2
2	6335197fb2fc4a05475ff594	84001005	US	USA	840	1005.0	Barbour	AL	US	Barbour, Alabama, US	24686.0	{'type': 'Point', 'coordinates': [-85.3871, 31...	2021-01-15T00:00:00.000Z	1712	36	16	0
3	6335197fb2fc4a05475ff968	84001007	US	USA	840	1007.0	Bibb	AL	US	Bibb, Alabama, US	22394.0	{'type': 'Point', 'coordinates': [-87.1251, 32...	2021-01-15T00:00:00.000Z	2130	47	17	0
4	6335197fb2fc4a05475ffd3c	84001009	US	USA	840	1009.0	Blount	AL	US	Blount, Alabama, US	57826.0	{'type': 'Point', 'coordinates': [-86.5679, 33...	2021-01-15T00:00:00.000Z	5264	83	45	3

Here we are going to make a unique value like last time we did with hospital data.

covid['state_county']=covid['state']+'_'+covid['county']
covid.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	_id	uid	country_iso2	country_iso3	country_code	fips	county	state	country	combined_name	population	loc	date	confirmed	deaths	confirmed_daily	deaths_daily	state_county
0	6335197fb2fc4a05475fedec	84001001	US	USA	840	1001.0	Autauga	AL	US	Autauga, Alabama, US	55869.0	{'type': 'Point', 'coordinates': [-86.6441, 32...	2021-01-15T00:00:00.000Z	5103	55	28	0	AL_Autauga
1	6335197fb2fc4a05475ff1c0	84001003	US	USA	840	1003.0	Baldwin	AL	US	Baldwin, Alabama, US	223234.0	{'type': 'Point', 'coordinates': [-87.7221, 30...	2021-01-15T00:00:00.000Z	16002	179	161	2	AL_Baldwin
2	6335197fb2fc4a05475ff594	84001005	US	USA	840	1005.0	Barbour	AL	US	Barbour, Alabama, US	24686.0	{'type': 'Point', 'coordinates': [-85.3871, 31...	2021-01-15T00:00:00.000Z	1712	36	16	0	AL_Barbour
3	6335197fb2fc4a05475ff968	84001007	US	USA	840	1007.0	Bibb	AL	US	Bibb, Alabama, US	22394.0	{'type': 'Point', 'coordinates': [-87.1251, 32...	2021-01-15T00:00:00.000Z	2130	47	17	0	AL_Bibb
4	6335197fb2fc4a05475ffd3c	84001009	US	USA	840	1009.0	Blount	AL	US	Blount, Alabama, US	57826.0	{'type': 'Point', 'coordinates': [-86.5679, 33...	2021-01-15T00:00:00.000Z	5264	83	45	3	AL_Blount

We will create another data frame with columns that we desire to be within it and we are going to sum across entries on those columns.

CovidPerCounty=covid.groupby(['state_county'], as_index=False).agg({'deaths':'sum','confirmed':'sum','population':'sum','confirmed_daily':'mean','deaths_daily':'mean'})
CovidPerCounty.head(15)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	state_county	deaths	confirmed	population	confirmed_daily	deaths_daily
0	AK_Aleutians East	0	40	3337.0	1.0	0.0
1	AK_Aleutians West	0	242	5634.0	3.0	0.0
2	AK_Anchorage	129	24530	288000.0	92.0	0.0
3	AK_Bethel	12	2911	18386.0	26.0	0.0
4	AK_Bristol Bay	0	0	836.0	0.0	0.0
5	AK_Bristol Bay plus Lake and Peninsula	0	163	2428.0	1.0	0.0
6	AK_Chugach	3	241	6751.0	0.0	0.0
7	AK_Copper River	0	175	2699.0	1.0	0.0
8	AK_Denali	0	63	2097.0	0.0	0.0
9	AK_Dillingham	1	150	4916.0	1.0	0.0
10	AK_Fairbanks North Star	23	5576	96849.0	46.0	1.0
11	AK_Haines	0	23	2530.0	0.0	0.0
12	AK_Hoonah-Angoon	0	0	2148.0	0.0	0.0
13	AK_Juneau	5	1098	31974.0	4.0	0.0
14	AK_Kenai Peninsula	19	3821	58708.0	11.0	0.0

We are going to create a new column "mortality" in CovidPerCounty data frame. We will plot this value onto our final plot to compare with people per bed count value for each of counties.

CovidPerCounty['mortality']=CovidPerCounty['deaths']/CovidPerCounty['confirmed']
CovidPerCounty.head(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	state_county	deaths	confirmed	population	confirmed_daily	mortality
0	AK_Aleutians East	0	40	3337.0	1.0	0.000000
1	AK_Aleutians West	0	242	5634.0	3.0	0.000000
2	AK_Anchorage	129	24530	288000.0	92.0	0.005259
3	AK_Bethel	12	2911	18386.0	26.0	0.004122
4	AK_Bristol Bay	0	0	836.0	0.0	NaN

Here we are going to merge two data frame and take out the values which can make interference with statistics.

Hospital_Covid_PerCounty=pd.merge(HospitalInfoPerCounty, CovidPerCounty, on='state_county')
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['population']>50000]
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['population']<300000]
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['confirmed']>10]
Hospital_Covid_PerCounty.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	state_county	hospital_bed_count	deaths	confirmed	population	confirmed_daily	deaths_daily	mortality
0	AL_Baldwin	240	179	16002	223234.0	161.0	2.0	0.011186
1	AL_Blount	65	83	5264	57826.0	45.0	3.0	0.015767
2	AL_Calhoun	552	200	10982	113605.0	119.0	3.0	0.018212
8	AL_Cullman	115	104	7960	83768.0	38.0	10.0	0.013065
9	AL_Elmore	69	101	7723	81209.0	53.0	0.0	0.013078

In merged data, we are going to create a new column "people_per_bed" in order to compare with mortality.

Hospital_Covid_PerCounty['people_per_bed']=Hospital_Covid_PerCounty['population']/Hospital_Covid_PerCounty['hospital_bed_count']

Hospital_Covid_PerCounty.tail(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	state_county	hospital_bed_count	deaths	confirmed	population	confirmed_daily	deaths_daily	mortality	people_per_bed
1652	WV_Monongalia	720	52	6311	105612.0	67.0	1.0	0.008240	146.683333
1657	WV_Putnam	70	63	3444	56450.0	37.0	2.0	0.018293	806.428571
1658	WV_Raleigh	173	37	3505	73361.0	79.0	1.0	0.010556	424.052023
1665	WV_Wood	343	76	5993	83518.0	54.0	6.0	0.012681	243.492711
1669	WY_Natrona	217	103	7039	79858.0	18.0	0.0	0.014633	368.009217

We are going to make the final plot by using seaborn as we did last time

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
plot=sns.relplot(data=Hospital_Covid_PerCounty,x="people_per_bed", y="mortality", hue="confirmed", size="population",
                 sizes=(40, 400)).set(ylim=(0,0.06),xlim=(0,6000))

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Homework4.ipynb		Homework4.ipynb
README.md		README.md
output_28_0.png		output_28_0.png
output_3_1.png		output_3_1.png
output_7_0.png		output_7_0.png
output_8_0.png		output_8_0.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homework4

Usage

Description

Instruction

Here we are going to do some exercise on getting Covid-19 data by using API, and we will make a plot with seaborn function within Jupyter notebook at the end.

Here we have an issue with values on x- and y-axix so we are going to check the value first.

Here we are going to make a change data type of "Lat" and "Lon".

Now we can see that the data type of "Lat" and "Lon" has changed. Now we can make the plot correctly.

Dataset COVID

Now we are going to get another data from different source. We are going to merge these two data by using pandas, and we are going to make a plot with the merged values in the merged data with seaborn

The count is not unique because there are same name all over the states in the U.S.. Therefore, we are going to create a new column called "state_county".

the number we have in AK_Kenai Peninsula Borough is too high comparing to the others. We need to check the data type of it.

Here we are going to make to change its data type to numbers.

We are going to create a data frame that is unique by "state_county", and we are going to sum across entries on "hospital_bed_count".

COVID DATAFRAME

Here we are going to work with Covid data in order to compare with "HospitalInfoPerCounty"

We are going to convert full states name under "state" column to two letter code. In order to do that, we are going to make a library and replace states name by using the library. Any state name matches with its key in the library replace to values in the library.

Here we are going to make a unique value like last time we did with hospital data.

We will create another data frame with columns that we desire to be within it and we are going to sum across entries on those columns.

We are going to create a new column "mortality" in CovidPerCounty data frame. We will plot this value onto our final plot to compare with people per bed count value for each of counties.

Here we are going to merge two data frame and take out the values which can make interference with statistics.

In merged data, we are going to create a new column "people_per_bed" in order to compare with mortality.

We are going to make the final plot by using seaborn as we did last time

We do not see any correlation between mortality and people per bed count

About

Releases

Packages

Languages

eunksung/trgn_assignment4

Folders and files

Latest commit

History

Repository files navigation

Homework4

Usage

Description

Instruction

Here we are going to do some exercise on getting Covid-19 data by using API, and we will make a plot with seaborn function within Jupyter notebook at the end.

Here we have an issue with values on x- and y-axix so we are going to check the value first.

Here we are going to make a change data type of "Lat" and "Lon".

Now we can see that the data type of "Lat" and "Lon" has changed. Now we can make the plot correctly.

Dataset COVID

Now we are going to get another data from different source. We are going to merge these two data by using pandas, and we are going to make a plot with the merged values in the merged data with seaborn

The count is not unique because there are same name all over the states in the U.S.. Therefore, we are going to create a new column called "state_county".

the number we have in AK_Kenai Peninsula Borough is too high comparing to the others. We need to check the data type of it.

Here we are going to make to change its data type to numbers.

We are going to create a data frame that is unique by "state_county", and we are going to sum across entries on "hospital_bed_count".

COVID DATAFRAME

Here we are going to work with Covid data in order to compare with "HospitalInfoPerCounty"

We are going to convert full states name under "state" column to two letter code. In order to do that, we are going to make a library and replace states name by using the library. Any state name matches with its key in the library replace to values in the library.

Here we are going to make a unique value like last time we did with hospital data.

We will create another data frame with columns that we desire to be within it and we are going to sum across entries on those columns.

We are going to create a new column "mortality" in CovidPerCounty data frame. We will plot this value onto our final plot to compare with people per bed count value for each of counties.

Here we are going to merge two data frame and take out the values which can make interference with statistics.

In merged data, we are going to create a new column "people_per_bed" in order to compare with mortality.

We are going to make the final plot by using seaborn as we did last time

We do not see any correlation between mortality and people per bed count

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages