- Subset only those rows that have “India” in the “location” column(This subsetted dataframe has to be used for modelling)
- Handle Missing values: a. If there are null values in continuous numerical column, replace the null values by the mean of that column b. If there are null values in ordinal numerical column, replace the null values by the mode of that column c. If there are null values in categorical column, replace the null values by the mode of that column d. If more than 50%the values in a column are null, then drop that entire column
- Univariate Analysis: a. Draw histograms of 10 feature columns b. Find mean, median and mode of each column
- Bivariate Analysis: a. Draw scatter plots of target column versus 10 features b. Draw line plots of target column versus 10 features
- Convert date column to ordinal a. Code: import datetime as dtdf["date"]=pd.to_datetime(df["date"]) df["date"]=df["date"].map(dt.datetime.toordinal)
- Drop useless categorical columns, and convert useful categorical to numerical by labelencoder
- Select “total_cases” column as the target variable
- Select the other columns as the features(NOTE: the “date” column has to be compulsorily in the features)
- Perform train-test split
- Modelling: a. Linear Regression b. Random Forest Regressor
- Get accuracy
- Predict Total case for a new date
-
Notifications
You must be signed in to change notification settings - Fork 0
Dis-ease-20/covidAnalysis
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published