Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations #68

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Binary file added 83faa2a9-82cb-4cc0-ba23-3a858cd0f4af.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# logistic_growth
This README file answers questions 1, 2, and 3 of the reproducible research homework. Please refer to the individual R scripts linked for each part for complete code annotations.

# Question #1
I used csv #2 for this analysis.

First, refer to plot_data.R, where I have annotated the code for my individual steps - https://github.com/audickinson/logistic_growth/blob/main/plot_data.R

In the first graph I create, showing time on the x axis and population size on the y axis, it is clear that growth follows a logistic pattern. Population size increases exponentially at first, then levels off at a carrying capacity.

![5e112742-fe03-4387-af65-19a004dc1f67](https://github.com/audickinson/logistic_growth/assets/150164051/2c1b0a5f-8c32-459d-9a53-7cef2fd292fa)


In the next graph, the same data is shown, but this time on a log scale.
![ab40309e-3c91-4d3a-8849-c1c193f72491](https://github.com/audickinson/logistic_growth/assets/150164051/1d97d753-c774-4b5a-9452-4f5e43ebeb6c)

Next, move on to fit_linear_model.r -- https://github.com/audickinson/logistic_growth/blob/main/fit_linear_model.R

First, I fit separate linear models for the exponential and level (carrying capacity) portions of the graph. To isolate initial population growth, I only use the first two data points at t = 0 and t = 60. To measure population size at carrying capacity, I only use data points where N is well past its growth stage - over 1000, for example.

I use the summary() function to find the slope and intercept of each of these linear models, which are important parameters in the final script.

Next, move on to plot_data_and_model.R -- https://github.com/audickinson/logistic_growth/blob/main/plot_data_and_model.R

First, I define a function that replicates the structure of the basic logistic growth equation such that, when the parameters N0, K, and r are defined, the input of a given time value will yield the population size at that time.

I take the values of N0, r, and K (see scripts for explanations how) from the parameters output from the linear models in fit_linear_model.r .

I make a graph with the model, using these three data points, plotted in red on top of the original data points, and it appears to fit extraordinarily well, confirming the accuracy of the parameters derived from the linear model.
![79c9636e-fce9-43e2-8b39-0b2dcda7a7ce](https://github.com/audickinson/logistic_growth/assets/150164051/40041cd2-5c81-4e0c-aabc-d77a370ccb34)


# Results
In this exercise, I found the values of key population parameters of a given sample of cells. From the results of linear models, and confirmed by the fit of the logistic growth function with the following parameters, I can discern that N0 = 2000, r = 0.03, and K = 1*10^9. In this context, this means that the initial population size in the model is 2000, the growth rate is 3%, and the final population size is 9 billion cells.

# Question #2

Under logistic growth, at time t = 4980 min, my cell culture is clearly at carrying capacity of 1*10^9 cells.
As a simple exponential equation -- effectively removing any reference to carrying capacity -- the equation reads


```math
\begin{equation}
\ N = N_0 e^{rt}
\end{equation}
```

Substituting in the given value of t while maintaining the same N0 and r:

```math
\begin{equation}
N = 2000 e^{0.03*4980} = 1.529768*10^{68}
\end{equation}
```

This is clearly an implausible value, since bacteria cannot grow indefinitely. The value calculated by this exponential equation is fully 59 orders of magnitude larger than the population size calculated under logistic growth. It is an absurd estimate, because carrying capacity and density dependence are critical factors in modeling population growth.


# Question #3
This graph compares logistic and exponential growth models on a semilog plot. The code to generate the graph, as well as the png file of the graph, are on the main page of the repo. The R script is called plot_comparison.R and is available here: https://github.com/1062648/logistic_growth/blob/main/plot_comparison.R

![83faa2a9-82cb-4cc0-ba23-3a858cd0f4af](https://github.com/audickinson/logistic_growth/assets/150164051/e8748698-b7b4-423e-b571-a2e4c99ed262)



85 changes: 85 additions & 0 deletions experiment2.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"t","N"
0,1879
60,12127
120,73300
180,442383
240,2671753
300,15947798
360,89287799
420,372300176
480,782047493
540,955960789
600,992442527
660,998742736
720,999791896
780,999965614
840,999994411
900,999999049
960,999999794
1020,999999883
1080,999999912
1140,1000000241
1200,1000000013
1260,999999951
1320,999999956
1380,1000000046
1440,999999931
1500,999999855
1560,1000000057
1620,999999898
1680,999999998
1740,999999906
1800,1000000110
1860,999999952
1920,999999929
1980,999999950
2040,999999837
2100,999999883
2160,999999782
2220,999999866
2280,999999971
2340,999999953
2400,1000000145
2460,999999893
2520,999999914
2580,999999972
2640,999999901
2700,999999903
2760,999999889
2820,999999875
2880,999999948
2940,999999950
3000,999999819
3060,999999942
3120,999999889
3180,999999899
3240,999999984
3300,1000000056
3360,1000000165
3420,999999923
3480,1000000161
3540,999999884
3600,1000000066
3660,1000000255
3720,999999997
3780,999999933
3840,999999999
3900,1000000178
3960,999999886
4020,1000000137
4080,1000000133
4140,1000000034
4200,1000000001
4260,999999954
4320,999999963
4380,1000000065
4440,1000000207
4500,999999985
4560,999999861
4620,999999928
4680,1000000026
4740,999999968
4800,999999982
4860,999999983
4920,999999863
4980,999999983
20 changes: 13 additions & 7 deletions fit_linear_model.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
#Script to estimate the model parameters using a linear approximation

#library(dplyr)
# library(dplyr) # Calling a common data manipulation package

growth_data <- read.csv("???")
growth_data <- read.csv("experiment2.csv") # Reading in the data as the object growth_data

#Case 1. K >> N0, t is small

data_subset1 <- growth_data %>% filter(t<???) %>% mutate(N_log = log(N))

# t is small - only calling data points less with t < 100, which for this purpose will only consider t = 0 and t = 60.
# Also creating an additional column called N_log, which is the log-transformed populaiton size.

data_subset1 <- growth_data %>% filter(t<100) %>% mutate(N_log = log(N))

# Creating a simple linear model with t as explanatory and log N as the response variable, from only that subset where t < 100.

model1 <- lm(N_log ~ t, data_subset1)
summary(model1)
summary(model1) # Finding parameters from the model - intercept = 7.53849 and slope = 0.03108

#Case 2. N(t) = K

data_subset2 <- growth_data %>% filter(t>???)
data_subset2 <- growth_data %>% filter(t>1000) # Doing the same but only at high values of t (where carrying capacity has been reached) - and this time not using a log value of N

model2 <- lm(N ~ 1, data_subset2)
summary(model2)
model2 <- lm(N ~ 1, data_subset2) # N is the response variable, explanatory variable is set to 1 because we wouldn't expect t to have any bearing on growth when N = K
summary(model2) # Parameters: intercept = 1.000e+09
35 changes: 35 additions & 0 deletions package-versions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3; LAPACK version 3.9.0

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C
[3] LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] dplyr_1.1.3 ggplot2_3.4.4

loaded via a namespace (and not attached):
[1] labeling_0.4.3 utf8_1.2.4 R6_2.5.1
[4] tidyselect_1.2.0 farver_2.1.1 magrittr_2.0.3
[7] gtable_0.3.4 glue_1.6.2 tibble_3.2.1
[10] pkgconfig_2.0.3 generics_0.1.3 lifecycle_1.0.3
[13] cli_3.6.1 fansi_1.0.5 scales_1.2.1
[16] grid_4.3.1 vctrs_0.6.4 withr_2.5.2
[19] compiler_4.3.1 tools_4.3.1 munsell_0.5.0
[22] pillar_1.9.0 colorspace_2.1-0 rlang_1.1.1
48 changes: 48 additions & 0 deletions plot_comparison.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#Script to plot data and model

growth_data <- read.csv("experiment2.csv") # Same data from earlier

# Writing a function: when you input a t value, output the N value using the specified values of K and r

logistic_fun <- function(t) {

N <- (N0*K*exp(r*t))/(K-N0+N0*exp(r*t))

return(N)

}


# Writing another function: input is t and output is N, but for the exponential equation.

exponential_fun <- function(t) {

N <- (N0*exp(r*t))

return(N)

}


N0 <- 2000 # Taken from model 1 - the intercept was 7.53849, but previously log-transformed, so it has to back-transformed to 2000

r <- 0.03 # Taken from model 1 - the slope of the linear model is 0.03108

K <- 1000000000 # Taken from model 2, where the intercept was 1* 10^9



# Creating our graph from earlier, but this time with the logistic model plotted onto it
ggplot(aes(t,N), data = growth_data) +

geom_function(fun=logistic_fun, colour="red") +

geom_function(fun=exponential_fun, colour = "blue") +

geom_point(size=0.25) + # Decreasing point size so you can see the red line

theme_bw() +

scale_y_continuous(trans='log10') # Adding a log scale so you can see the data points and other model - otherwise it's pressed against the x axis


28 changes: 19 additions & 9 deletions plot_data.R
Original file line number Diff line number Diff line change
@@ -1,26 +1,36 @@
#Script to plot the logistic growth data

growth_data <- read.csv("???")
growth_data <- read.csv("experiment2.csv") # Reading in the data from csv 2

install.packages("ggplot2")
install.packages("ggplot2") # Installing ggplot 2, which we will use for plotting
library(ggplot2)

ggplot(aes(t,N), data = ???) +
# Visualizing time vs population size

ggplot(aes(t,N), data = growth_data) + # Plotting the growth data with time on the x axis and population size on the y axis

geom_point() +
geom_point() + # Adding individual data points to the graph

xlab("t") +
xlab("t") + # Adding x label

ylab("y") +
ylab("y") + # Adding x label

theme_bw()
theme_bw() # Setting the theme to black and white, removing gray background


ggplot(aes(t,???), data = growth_data) +
# Plotting the same data on a log scale
# Same steps, but this time I log-transform the data.

ggplot(aes(t,log(N)), data = growth_data) +

geom_point() +

xlab("t") +

ylab("y") +

scale_y_continuous(trans='log10')
scale_y_continuous(trans='log10') +

theme_bw() # Added the same theme to make them match


19 changes: 13 additions & 6 deletions plot_data_and_model.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#Script to plot data and model

growth_data <- read.csv("???")
growth_data <- read.csv("experiment2.csv") # Same data from earlier

# Writing a function: when you input a t value, output the N value using the specified values of K and r

logistic_fun <- function(t) {

Expand All @@ -10,17 +12,22 @@ logistic_fun <- function(t) {

}

N0 <- ??? #
N0 <- 2000 # Taken from model 1 - the intercept was 7.53849, but previously log-transformed, so it has to back-transformed to 2000

r <- ??? #
r <- 0.03 # Taken from model 1 - the slope of the linear model is 0.03108

K <- ??? #
K <- 1000000000 # Taken from model 2, where the intercept was 1* 10^9


ggplot(aes(???,???), data = growth_data) +

# Creating our graph from earlier, but this time with the logistic model plotted onto it
ggplot(aes(t,N), data = growth_data) +

geom_function(fun=logistic_fun, colour="red") +

geom_point()
geom_point() +

theme_bw()

#scale_y_continuous(trans='log10')

Expand Down
13 changes: 13 additions & 0 deletions project.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX