Multiple-Linear-Regression-Analysis

A repo on how to perform multiple linear regression analysis. I have used a sample heart disease data that analyses the relationship between heart disease, biking and smoking.

A. Loading the Dataset to R studio by importing to text (base)

Get the summary of the heart.data dataset to check that it has been read correctly.

summary(head.data)

B. Ensuring that our data meets the main assumptions

Independence of Observations - Using cors() function to check the relationship between our independent variables.

cor(heart.data$biking, heart.data$smoking)

Normality - Using the hist() function to test whether our dependent variable follows a normal distribution.

hist(heart.data$heart.disease)

Linearity - Checking the two scatterplots both the biking and heart disease, and one for smoking and heart disease.

plot(heart.disease ~ biking, data=heart.data)

plot(heart.disease ~ smoking, data=heart.data)

C. Performing Linear Regression Analysis

Checking if there's a linear relationship between biking to work, smoking, and heart disease in our imaginary survey of 500 towns.

heart.disease.lm<-lm(heart.disease ~ biking + smoking, data = heart.data)

summary(heart.disease.lm)

D. Checking for Homoscedasticity

Before proceeding with data visualization, we need to ensure that our models fit the homoscedasticity assumption of the linear model.

par(mfrow=c(2,2))
plot(heart.disease.lm)
par(mfrow=c(1,1))

E. Visualizing the results with a graph

Plotting the relationship between biking and heart disease at different levels of smoking. Smoking will be treated as a factor with three levels, just for the purposes of displaying the relationships in our data.

Creating a new dataframe with the information needed to plot the model - This will not create anything new in your console, but you should see a new data frame appear in the Environment tab. Click on it to view it.

plotting.data<-expand.grid(
  biking = seq(min(heart.data$biking), max(heart.data$biking), length.out=30),
    smoking=c(min(heart.data$smoking), mean(heart.data$smoking), max(heart.data$smoking)))

Predicting the values of heart disease based on our linear model - Saving our ‘predicted y’ values as a new column in the dataset we've created

plotting.data$predicted.y <- predict.lm(heart.disease.lm, newdata=plotting.data)

Rounding the smoking numbers to two decimal values - This will make the legend easier to read later on.

plotting.data$smoking <- round(plotting.data$smoking, digits = 2)

Changing the smoking variable into a factor - This allows us to plot the interaction between biking and heart disease at each of the three levels of smoking we chose.

plotting.data$smoking <- as.factor(plotting.data$smoking)

Plotting the original data

install.packages("ggplot2")

then run

library(ggplot2) 
 
then lastly 

heart.plot <- ggplot(heart.data, aes(x=biking, y=heart.disease)) +
  geom_point()

heart.plot

Adding the regression lines

heart.plot <- heart.plot +
  geom_line(data=plotting.data, aes(x=biking, y=predicted.y, color=smoking), size=1.25)

heart.plot

Making the graph ready for publication

heart.plot <-
heart.plot +
  theme_bw() +
  labs(title = "Rates of heart disease (% of population) \n as a function of biking to work and smoking",
      x = "Biking to work (% of population)",
      y = "Heart disease (% of population)",
      color = "Smoking \n (% of population)")

heart.plot

** Adding our regression model to the graph

heart.plot + annotate(geom="text", x=30, y=1.75, label=" = 15 + (-0.2*biking) + (0.178*smoking)")

F. Reporting our results

In our survey of 500 towns, we found significant relationships between the frequency of biking to work and the frequency of heart disease and the frequency of smoking and frequency of heart disease (p < 0 and p < 0.001, respectively). Specifically we found a 0.2% decrease (± 0.0014) in the frequency of heart disease for every 1% increase in biking, and a 0.178% increase (± 0.0035) in the frequency of heart disease for every 1% increase in smoking.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.RData		.RData
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
heart.R		heart.R
heart.data.csv		heart.data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiple-Linear-Regression-Analysis

A. Loading the Dataset to R studio by importing to text (base)

B. Ensuring that our data meets the main assumptions

C. Performing Linear Regression Analysis

D. Checking for Homoscedasticity

E. Visualizing the results with a graph

F. Reporting our results

About

Releases

Packages

Languages

License

Marx-wrld/Multiple-Linear-Regression-Analysis

Folders and files

Latest commit

History

Repository files navigation

Multiple-Linear-Regression-Analysis

A. Loading the Dataset to R studio by importing to text (base)

B. Ensuring that our data meets the main assumptions

C. Performing Linear Regression Analysis

D. Checking for Homoscedasticity

E. Visualizing the results with a graph

F. Reporting our results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages