forked from ShuoranLi/506_Project
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfinalProject.Rmd
310 lines (281 loc) · 11.4 KB
/
finalProject.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
---
title: "Monte Carlo Simulation of Stock Portfolio in R, Matlab, and Python"
author: "Feichen Shen, Israel Diego, Shuoran Li"
date: "`r format.Date(Sys.Date(), '%B %d, %Y')`"
output:
html_document:
toc: true
toc_float: true
theme: journal
highlight: pygments
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(data.table)
library(ggplot2)
library(reticulate)
use_python('/Users/shuoranli/Documents/PycharmProjects/Group_project_506/venv/bin/python')
use_virtualenv('r-reticulate')
```
## Monte Carlo Introduction
The purpose of this tutorial is to demonstrate Monte Carlo Simulation in Matlab,
R, and Python. We conduct our Monte Carlo study in the context of simulating
daily returns for an investment portfolio.
For simplicity we will only consider three assets: Apple, Google, and Facebook.
We will assume an Initial Investment of $100,000 and allocate our money evenly
between the three stocks. In this case the portfolio weights $w_i = 1/3$ for
$i = 1,2,3$.
Next, we assume that daily returns are distributed Multivariate Normal with mean
vector $\mu$ and covariance matrix $\Sigma$. In other words,
$$R_t \sim MVN(\mu, \Sigma)$$ for $t \in \{1,\dots,T\}$ where $T$ is the final
time horizon.
We will use the Cholesky Factorization in order to find Lower Triangular Matrix
$L$ such that $LL' = \Sigma$. Then our returns can be generated by
$$ R_t = \mu + LZ_t $$ where $$Z_t \sim N(0,I)$$ for $t \in \{1,\dots,T\}$.
The returns will be simulated over a 30-day period, where our 30-day returns
can be formulated as, $$\hat R_{30} = \prod_{t=1}^{30} (1+R_t)$$
Thus our portfolio returns for each Monte Carlo trial $m$ become the inner
product between the 30-day returns and our vector of portfolio weights $w$,
$$P_m = w \cdot \hat R_{30} $$.
## Dataset Summary
We use adjusted-close stock prices for Apple, Google, and Facebook from November 14th, 2017 - November 14th, 2018. Historical stock price data can be found on Yahoo
Finance for these companies. Also here is the link to the data set for this
tutorial ['Stock Price Data'](https://raw.githubusercontent.com/ShuoranLi/506_Project/master/Group21_ProjectData.csv).
The first ten rows of data look like :
```{r R_version_c2, warning=FALSE, message=FALSE,echo=FALSE}
stock_Data = fread('./Group21_ProjectData.csv')
stock_Data[1:10, ]
```
## Languages {.tabset}
### R
Firstly, we need to load the data
```{r R_version_c1, warning=FALSE, message=FALSE}
stock_Data = fread('./Group21_ProjectData.csv')
```
Then we extract the stock price and set initial values for Monte-Carlo parameters
```{r R_version_c3, warning=FALSE, message=FALSE}
stock_Price = as.matrix( stock_Data[ , 2:4] )
mc_rep = 1000 # Number of Monte Carlo Simulations
training_days = 30
```
Get the returns by stock price and set the investment weights
```{r R_version_c4, warning=FALSE, message=FALSE}
# This function returns the first differences of a t x q matrix of data
returns = function(Y){
len = nrow(Y)
yDif = Y[2:len, ] / Y[1:len-1, ] - 1
}
# Get the Stock Returns
stock_Returns = returns(stock_Price)
# Suppose we invest our money evenly among all three assets
# We use today's Price 11/14/2018 to find the number of shares each stock
# that we buy
portfolio_Weights = t(as.matrix(rep(1/ncol(stock_Returns), ncol(stock_Returns))))
print(portfolio_Weights)
```
Calculate the Covariance matrix and Mean value of Stock Returns
```{r R_version_c5, warning=FALSE, message=FALSE}
# Get the Variance Covariance Matrix of Stock Returns
coVarMat = cov(stock_Returns)
miu = colMeans(stock_Returns)
# Extend the vector to a matrix
Miu = matrix(rep(miu, training_days), nrow = 3)
```
Use Monte-Carlo to simulate the 30-day Portfolio Returns
```{r R_version_c6, warning=FALSE, message=FALSE}
# Initializing simulated 30 day portfolio returns
portfolio_Returns_30_m = matrix(0, training_days, mc_rep)
set.seed(200)
for (i in 1:mc_rep) {
Z = matrix ( rnorm( dim(stock_Returns)[2] * training_days ), ncol = training_days )
# Lower Triangular Matrix from our Choleski Factorization
L = t( chol(coVarMat) )
# Calculate stock returns for each day
daily_Returns = Miu + L %*% Z
# Calculate portfolio returns for 30 days
portfolio_Returns_30 = cumprod( portfolio_Weights %*% daily_Returns + 1 )
# Add it to the monte-carlo matrix
portfolio_Returns_30_m[,i] = portfolio_Returns_30;
}
```
Visualising the result ( Simulated Portfolio Returns in 30 days)
```{r R_version_c7, warning=FALSE, message=FALSE}
# Visualising result
x_axis = rep(1:training_days, mc_rep)
y_axis = as.vector(portfolio_Returns_30_m-1)
plot_data = data.frame(x_axis, y_axis)
ggplot(data = plot_data, aes(x = x_axis, y = y_axis)) + geom_path(col = 'red', size = 0.1) +
xlab('Days') + ylab('Portfolio Returns') +
ggtitle('Simulated Portfolio Returns in 30 days')+
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
```
Get some useful statistics through the results we get
```{r R_version_c8, warning=FALSE, message=FALSE}
# Porfolio Returns statistics on the 30th day.
Avg_Portfolio_Returns = mean(portfolio_Returns_30_m[30,]-1)
SD_Portfolio_Returns = sd(portfolio_Returns_30_m[30,]-1)
Median_Portfolio_Returns = median(portfolio_Returns_30_m[30,]-1)
print(c(Avg_Portfolio_Returns,SD_Portfolio_Returns,Median_Portfolio_Returns))
# Construct a 95% Confidential Interval for average returns
Avg_CI = quantile(portfolio_Returns_30_m[30,]-1, c(0.025, 0.975))
print(Avg_CI)
```
### Matlab
Load data and extract stock price
```{r matlab_chunk2, warning=FALSE, message=FALSE, eval=FALSE}
stockData = readtable('Group21_ProjectData.csv');
stockPrices = table2array(stockData(:, 2:end));
```
Set Monte_Carlo parameters
```{r matlab_chunk3, warning=FALSE, message=FALSE, eval=FALSE}
mc_rep = 1000;
initInvestment = 100000;
numTradingDays = 30
```
Calculate stock returns
```{r matlab_chunk4, warning=FALSE, message=FALSE, eval=FALSE}
stock_returns = stock_price(2:end, :) ./ stock_price(1:end-1, :) - 1;
```
Set portfolio weight
```{r matlab_chunk5, warning=FALSE, message=FALSE, eval=FALSE}
portfolioWeights = (1/3) * ones(1, size(stockPrices,2));
```
Calculate covariance matrix and mean of the stock returns
```{r matlab_chunk6, warning=FALSE, message=FALSE, eval=FALSE}
% Get the Variance Covariance Matrix of our Stock Returns
coVarMat = cov(stockReturns);
% Average returns of each asset
mu = transpose(mean(stockReturns));
mu = repmat(mu, 1, numTradingDays);
```
Then we use Monte-Carlo to simulate the portfolio returns in 30 days
```{r matlab_chunk7, warning=FALSE, message=FALSE, eval=FALSE}
for i = 1:mc_rep
% 'Randomly generated numbers from N(0,1) distribution'
Z = randn(size(stockReturns,2), numTradingDays);
% 'Lower Triangular Matrix from Choleski Factorization'
L = chol(coVarMat, 'lower');
% 'Calculate daily returns for 30 days'
dailyReturns = mu + (L * Z);
% 'Portfolio Returns'
thirtyDayReturn = transpose(cumprod(portfolioWeights * dailyReturns + 1));
% 'Add return to the set of all 30-day portfolio returns'
portfolio30DayReturn_m(:,i) = thirtyDayReturn;
end
```
Visualizing the result
```{r matlab_chunk8, warning=FALSE, message=FALSE, eval=FALSE}
plot(portfolio30DayReturn_m - 1, 'LineWidth', 0.5, 'Color', [0,0.7,0.9, 0.2])
title('Simulated Portfolio Returns in 30 days', 'fontsize', 16)
xlabel('Days','fontsize',16)
ylabel('Portfolio Returns','fontsize',16)
```
```{r matlab_pic1, echo=FALSE}
knitr::include_graphics('./matlab_pics/ThirtyDay_Returns_Matlab_MC.png')
```
Finally, we want to get some useful statistics:
```{r matlab_chunk9, warning=FALSE, message=FALSE, eval=FALSE}
% Calculate some statistics for our simulated portfolio returns
averagePortfolioReturns = mean(portfolio30DayReturn_m(end,:) - 1);
stdDevPortfolioReturns = std(portfolio30DayReturn_m(end,:) - 1);
medianPortfolioReturns = median(portfolio30DayReturn_m(end,:) - 1);
% Construct a 95% Confidential Interval for average returns
average_CI = quantile(portfolio30DayReturn_m(end,:) - 1, [0.025, 0.975]);
```
```{r matlab_pic2, echo=FALSE,out.width='400px'}
knitr::include_graphics('matlab_pics/Matlab_Simulation_Stats.png')
```
### Python
Load modules
```{r chunk1, warning=FALSE, message=FALSE, eval=FALSE}
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```
Load data and extract stock price
```{r chunk2, warning=FALSE, message=FALSE, eval=FALSE}
stock_data = pd.read_csv("Group21_ProjectData.csv")
stock_price = stock_data.iloc[:,1:4]
stock_price = stock_price.values
```
Set Monte_Carlo parameters
```{r chunk3, warning=FALSE, message=FALSE, eval=FALSE}
mc_rep = 1000
train_days = 30
```
Calculate stock returns
```{r chunk4, warning=FALSE, message=FALSE, eval=FALSE}
nrows = len(stock_price)
stock_returns = stock_price[1:nrows,:] / stock_price[0:nrows-1,:] - 1
```
Set portfolio weight
```{r chunk5, warning=FALSE, message=FALSE, eval=FALSE}
portf_WT = np.array([1/3, 1/3, 1/3])
```
Calculate covariance matrix and mean of the stock returns
```{r chunk6, warning=FALSE, message=FALSE, eval=FALSE}
cov = np.cov(np.transpose(stock_returns))
miu = np.mean(stock_returns, axis=0)
Miu = np.full((train_days,3),miu)
Miu = np.transpose(Miu)
```
Then we use Monte-Carlo to simulate the portfolio returns in 30 days
```{r chunk7, warning=FALSE, message=FALSE, eval=FALSE}
# initial matrix
portf_returns_30_m = np.full((train_days,mc_rep),0.)
np.random.seed(100)
for i in range(0,mc_rep):
Z = np.random.normal(size=3*train_days)
Z = Z.reshape((3,train_days))
L = np.linalg.cholesky(cov)
daily_returns = Miu + np.inner(L,np.transpose(Z))
portf_Returns_30 = np.cumprod(np.inner(portf_WT,np.transpose(daily_returns)) + 1)
portf_returns_30_m[:,i] = portf_Returns_30
```
Visualizing the result
```{r chunk8, warning=FALSE, message=FALSE, eval=FALSE}
plt.plot(portf_returns_30_m)
plt.ylabel('Portfolio Returns')
plt.xlabel('Days')
plt.title('Simulated Portfolio Returns in 30 days')
plt.show()
```
```{r pic1, echo=FALSE}
knitr::include_graphics('./Python_pics/Portf_returns_py.png')
```
Finally, we want to get some useful statistics:
```{r chunk9, warning=FALSE, message=FALSE, eval=FALSE}
# Porfolio Returns statistics on the 30th day
Avg_portf_returns = np.mean(portf_returns_30_m[29,:]-1)
SD_portf_returns = np.std(portf_returns_30_m[29,:]-1)
Median_portf_returns = np.median(portf_returns_30_m[29,:]-1)
print(Avg_portf_returns)
print(SD_portf_returns)
print(Median_portf_returns)
```
```{r pic2, echo=FALSE,out.width='200px'}
knitr::include_graphics('./Python_pics/avgsdmedian.png')
```
```{r chunk10, warning=FALSE, message=FALSE, eval=FALSE}
# construct CI for average
Avg_CI = np.quantile(portf_returns_30_m[29,:]-1,np.array([0.025,0.975]))
print(Avg_CI)
```
```{r pic3, echo=FALSE,out.width='200px'}
knitr::include_graphics('./Python_pics/CI.png')
```
## Results
For our particular example, the portfolio returns averaged over all monte carlo
trials had an average close to 0. The reason the average is close to 0 is
because Apple, Facebook, and Google have average returns close to 0 over the past year.
Therefore, our simulated returns essentially had no drift. Also, assuming a
normal distribution of the returns would not work well in practice since
stock returns are typically fat-tailed and not normally distributed. However, based on our
Monte Carlo Study, we do not suggest investing in this portfolio based on the
low expected portfolio returns.
## Note
Statistics we get using three different languages are slightly different, because in our simulation
process, we have generated random numbers and these numbers cannot be exactly identical.
## Reference
*Yahoo Finance*