HW1:
1: github setup, merge and fork
2: environmental variable
3 ec: try to plot
HW2:
1.show bus location: get API from MTA, read json file, peal information layer by layer
2.get bus info: get information from json, keys and structure may by inconsistent! use try: something except KeyError: next_stop=[]
3.pd.read_csv, plot
HW3:
1.generate 5 kinds of distribution,as increase sample amounts ,see the mean distribution get close to gaussion
2.Citi Bike , weekday and weekend distribution,error bar plot, normalize female and male then plot and error bar
Additional: ZTEST equation:$ Z = \frac{\mu_{pop} - \mu_{sample}}{\sigma / \sqrt{N}} $
3.exam new bus time will not be less(null hypothesis) by ZTEST
HW4:
1.review peer's citi bike.ipynb from HW3
2.find different statistic test in science paper, how to make a form chart in script
!3.NYC employ program effective or not, Z test , Chisq test
4.Pearson’s test; Spearman’s test;K-S test: assess correlation of two sample of a dataset (male and female)
extra credit: age distribution of day and night, are they of the same distribution?
HW5:
1.citi bike goodness of fit test, ks test, ad test, kl test(return entropy to indicate which model is better)
2.income gender bias:regression fit smf.olfs
3.form null hypothesis practice
HW6:
1.combine 2 data files, chi sq test to compare 2 models, decide which to be x,which to be y
extra credit: colorbar map
HW7:
1.plot 1 sunflowers turned into black and white
2.plot 2 water complaints by zipcode
HW8:
1.identify 311 complainers demographic characters and internet infrastructure accessibility
2.try to find possible errors, like duplicate value, missing value
HW9:
1.Time series analysis, discover event from MTA data
2.Find outlier,set significant interval (mean +- 3sigma)
3.Fourier transfer to identify periodic pattern
4.PCA to cluster
5.rolling mean and weighted average to smooth time series
HW10:
1.geopandas, Moran I, compare winter and summer citibike data geospatial correlation. find hotspot and coldspot
2.boxplot
3.decompose trend,period,and residue
4.group data into ten groups(quantile)
5.LISA(local indicator of spatial autocorrelation)
HW11:
1.Assignment 1: judge polygon contains a point or not ,find out which census tract contains CUSP
2.convert cooridnate system epsg, in shapefile
3.intersect polygon and point in shapefile
4.Assignment 2: Standardize time series by row!! and cluster them ,business count in NYC
5.plot different Kmeans clusters by different color
6.Agglomative clusting ,Dengrogram
HW12:
1.SQL get data, Asthma Dismissal data in NYC
2.semivariogram, to see if spatial correlation exist