covidAnalysis

Subset only those rows that have “India” in the “location” column(This subsetted dataframe has to be used for modelling)
Handle Missing values: a. If there are null values in continuous numerical column, replace the null values by the mean of that column b. If there are null values in ordinal numerical column, replace the null values by the mode of that column c. If there are null values in categorical column, replace the null values by the mode of that column d. If more than 50%the values in a column are null, then drop that entire column
Univariate Analysis: a. Draw histograms of 10 feature columns b. Find mean, median and mode of each column
Bivariate Analysis: a. Draw scatter plots of target column versus 10 features b. Draw line plots of target column versus 10 features
Convert date column to ordinal a. Code: import datetime as dtdf["date"]=pd.to_datetime(df["date"]) df["date"]=df["date"].map(dt.datetime.toordinal)
Drop useless categorical columns, and convert useful categorical to numerical by labelencoder
Select “total_cases” column as the target variable
Select the other columns as the features(NOTE: the “date” column has to be compulsorily in the features)
Perform train-test split
Modelling: a. Linear Regression b. Random Forest Regressor
Get accuracy
Predict Total case for a new date

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ML-MAJOR-JUNE(PROJECT).ipynb		ML-MAJOR-JUNE(PROJECT).ipynb
README.md		README.md
covidonly.csv		covidonly.csv

Provide feedback