Skip to content

anqitu/NTUOSS-AlgorithmicTradingWorkshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NTUOSS Data Science for Algorithmic Trading Workshop

by Tu Anqi for NTU Open Source Society

This workshop assumes basic knowledge of Python.

Disclaimer: This document is only meant to serve as a reference for the attendees of the workshop. It does not cover all the concepts or implementation details discussed during the actual workshop.


Workshop Details

When: Friday, 5 April 2018. 6:30 PM - 8:30 PM.
Where: LT1
Who: NTU Open Source Society

Questions

Please raise your hand any time during the workshop or email your questions to me later.

Errors

For errors, typos or suggestions, please do not hesitate to post an issue. Pull requests are very welcome! Thanks!


Task 0 - Getting Started

0.1 Introduction

84% of trades in NYSE were done using algorithmic trading!

For this workshop, we'll be developing our own strading strategy and simulate it with historic stock market data on Colaboratory.

  1. What is Algorithmic Trading?
    Algorithmic trading (automated trading, black-box trading or simply algo-trading) is the process of using computers programed to follow a defined set of instructions (an algorithm) for placing a trade in order to generate profits at a speed and frequency that is impossible for a human trader. The defined sets of rules are based on timing, price, quantity or any mathematical model.
    Video Links:
  1. What are the benefits of Algorithmic Trading?

  • Human Emotions = 0

  • Accuracy + Speed = 100

  • Scalability = level 1000

  1. What is Colaboratory?
    Colaboratory is a Google research project created to help disseminate machine learning education and research. It is a free Jupyter notebook environment that requires no setup and runs entirely in a virtual machine (VM) hosted in the cloud.

0.2 Overview

Here is an overview of today's workshop.

0.3 Initial Setup

Download this file and add to your own google drive. This start.ipynb file is a Jupyter notebook that contains the incomplete script that you are going to code on for today's workshop

Let's open the 'start.ipynb' file together to officially start the coding part of today's workshop: Right click 'start.ipynb' file -> Select 'Open with' -> Select 'Colaboratory'.

If you do not have any app to open the notebook yet, follow the steps as shown below: Right click 'start' file -> Select 'Connect more apps' -> Search for 'colaboratory' -> Click on 'connect'.

Task 1 - Set Up

1.1 Import Necessary Libraries

Firstly, we need to import some useful libraries for manipulating data.

  • Pandas: An open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.
  • Numpy: A fundamental package for scientific computing with Python.
  • Matplotlib: A Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
# TASK 1.1: Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')

#ignore warnings
import warnings
warnings.filterwarnings('ignore')

1.2 Import Financial Data

The pandas_datareader package allows for reading in data from sources such as Google, Yahoo! Finance, World Bank,… For today's workshop, we will use the package to read in data from Yahoo! Finance.

# TASK 1.2: Import Apple's stock market price from 2006 to 2019
import pandas_datareader as pdr
import datetime
aapl = pdr.get_data_yahoo('AAPL',
                          start=datetime.datetime(2006, 10, 1),
                          end=datetime.datetime(2019, 1, 1))

Task 2 - Work with Pandas Dataframe

As we have used pandas_datareader to import data into our workspace, the object aapl is a Pandas dataframe, which is a 2-dimensional data structure with columns of various types.

2.1 Check Data

Let's see how the Pandas Dataframe look like.

# Task 2.1.1 Check the structure of a Pandas Dataframe
print(aapl)

This dataframe contains six columns – High, Low, Open, Close, Volume and Adj Close.

  • High and Low represent the maximum, minimum, and last price of the share for the day.
  • The columns Open and Close represent the starting and final price at which the stock is traded on a particular day.
  • Volume is the number of shares bought or sold in the day
  • Adj Close is the adjusted closing price: it’s the closing price of the day that has been slightly adjusted to include any actions that occurred at any time before the next day’s open. You can use this column to examine historical returns or when you’re performing a detailed analysis on historical returns.

Another important thing to note is that the market is closed on weekends and public holidays. As shown in the above data, some date values are missing – 2006-10-07, 2006-10-08, 2006-10-14.

Since the data got so many rows, let's get some useful summary statistics.

# Task 2.1.2 Use the describe() function to get some useful summary statistics about your data
print(aapl.describe(include = 'all'))

Then, check its general information, including the type of each column, number of entries, index and size.

# Task 2.1.3 Check the general information of the dataframe
print(aapl.info())

Check the shape, columns and index.

# Task 2.1.4 Check the shape, columns and index of a Pandas Dataframe
print(aapl.shape)
print(aapl.columns)
print(aapl.index)

Inspect first few rows of the dataframe

# Task 2.1.5 Check first few rows of dataframe
print(aapl.head())

Inspect last 10 rows of the dataframe by changing the default from 5 to 10

# Task 2.1.6 Check last 10 rows of dataframe
print(aapl.tail(10))

Check whether there is any missing values

# Task 2.1.7 Check the existence of null values
print(aapl.isnull().sum())

2.2 Clean Data

The profit or loss calculation is usually determined by the closing price of a stock for the day, hence we will need the closing price as the target variable. Besides, for the return analysis of the simulation, we also need the adjusted closing price. All other columns can be taken off for this workshop.

To do this, we can either drop unwanted columns

# Task 2.2.1 Drop unwanted columns
aapl = aapl.drop(columns = ['High', 'Low'])

Or keep wanted columns

# Task 2.2.2 Keep wanted columns
aapl = aapl[['Close', 'Adj Close']]

Task 3 - Visualiza Data (Exploratory Data Analysis)

Lets's make make a simple time series plot of closing price. We can plot data directly from the DataFrame using the plot() method

# Task 3.1 Plot the closing price
data['Close'].plot()

The plot looks not very nice. Let's add some adjustments to the argument.

# Task 3.2 Plot the closing price with some nice adjustments
aapl['Close'].plot(figsize = (15,5), linewidth = 1, legend = True)

Let's compare to the online chart from Yahoo Finance!

We can also plot the time series of more then one columns. Here, let's plot both the closing and adjusted closing prices to compare their differences.

# Task 3.2 Plot the closing and adjusted closing prices
aapl['Close'].plot(figsize = (15,5), linewidth = 1, legend = True)
aapl['Adj Close'].plot(figsize = (15,5), linewidth = 1, legend = True)
plt.legend(['Close', 'Adj Close'])
plt.title('Apple Stock Price')
plt.show()

Task 4 - Build Your Trading Strategy

There are many trading strategies out there, with various level of complexity. A “hello world” trading strategy used is the Moving Average Crossover, which considers the interaction of two MA (moving averages) - one longer and one shorter.

  • Golden cross: When the shorter-term MA crosses above the longer-term MA, it's a buy signal, as it indicates that the trend is shifting up.
  • Dead/death cross: When the shorter-term MA crosses below the longer-term MA, it's a sell signal, as it indicates that the trend is shifting down.

This simple strategy might seem a bit complex when we are just starting out. Don't worry, let’s take this step by step:

First of all, define your two different lookback periods: a short window and a long window. Set up two variables and assign one integer each. Make sure that the integer that you assign to the short window is shorter than the integer that you assign to the long window variable!

# Task 4.1 Initialize the short and long windows
short_window = 50
long_window = 100

To calculate the MA, we can of course write our own codes. However, Pandas provide the rolling() function to make our life easier. To start the rolling window calculcations - specify the window (size of the moving window as either short_window, long_window) and the min_period (the minimum number of observations in window required to have a value as 1). Next, don’t forget to also chain the mean() function so that you calculate the rolling mean.

# Task 4.2 Create moving averages over the short window
aapl['short_ma'] = aapl['Close'].rolling(window = short_window, min_periods = 1, center = False).mean()
aapl['long_ma'] = aapl['Close'].rolling(window = long_window, min_periods = 1, center = False).mean()

Then, create a column to indicate whether the short MA is higher than the long MA, but only for the period greater than the shortest MA window. Initialize it by setting the value for all rows in this column to 0.

# Task 4.3 Initilize as 0
aapl['higher_short_ma'] = 0

Set 1 for rows where short_ma is higher. Note that we need to add the [short_window:] to comply with the condition “only for the period greater than the shortest MA window”. When the condition is true, the initialized value 0 in the column will be overwitten with 1.

# Task 4.4 Set 1 for rows where short_ma is higher
aapl['higher_short_ma'][short_window:] = np.where(aapl['short_ma'][short_window:] > aapl['long_ma'][short_window:], 1, 0)   

Print out certain rows of the DataFrame and inspect the results.

# Task 4.5 Print rows where higher_short_ma is of value 1
aapl[aapl['higher_short_ma'] == 1]

Lastly, generate signals for actual trading orders - buying or selling stock by taking the difference of the higher_short_ma column. Signal = Today's higher_short_ma - Yesterday's 'higher_short_ma'. There will be three values in the signal column:

  • 1: Represents Buying. 1(Today's higher_short_ma) - 0(Yesterday's 'higher_short_ma') = 1.
  • -1: Represents Selling. 0(Today's higher_short_ma) - 1(Yesterday's 'higher_short_ma') = 1.
  • 0: Represents no actions. 1(Today's higher_short_ma) - 1(Yesterday's 'higher_short_ma') = 0 or 0(Today's higher_short_ma) - 0(Yesterday's 'higher_short_ma') = 0.
# Task 4.6 Generate trading signals
aapl['Signal'] = aapl['higher_short_ma'].diff()

Let's make sense of this calculation again by looking at some examples.

# Check days when there is a trading signal
aapl[aapl['Signal'] != 0]

The trading signal is 1 on 2006-12-12 and -1 on 2008-02-11

# Check the data in December of 2006 when buying happens
aapl['2006-12':].head(15)

With higher_short_ma on 2006-12-11 as 0, and higher_short_ma on 2006-12-12 as 1, short MA crosses over long MA on 2006-12-12. Thus, we should buy on this day.

# Check the data in February of 2008 when selling happens
aapl['2008-02':].head(15)

With higher_short_ma on 2008-02-08 as 0, and higher_short_ma on 2008-02-11 as -1, short MA crosses below long MA on 2006-12-12. Thus, we should sell on this day.

Let's make a plot of all of these - the short and long moving averages, together with the buy and sell signals

## Task 4.6 plot the short and long moving averages, together with the buy and sell signals

# Initialize the plot figure
fig = plt.figure(figsize = (15,5))

# Add a subplot and label for y-axis
ax1 = fig.add_subplot(111,  ylabel='Price')

# Plot the closing price
aapl['Close'].plot(ax = ax1, color = 'r', linewidth = 1, legend = True)

# Plot the short and long moving averages
aapl[['short_mvg', 'long_mvg']].plot(ax = ax1, linewidth = 1, legend = True)

# Plot the buy signals
ax1.plot(aapl[aapl['Signal'] == 1].index,
         aapl[aapl['Signal'] == 1]['short_mvg'],
         '^', markersize = 10, color = 'r', alpha = 0.6)

# Plot the sell signals
ax1.plot(aapl[aapl['Signal'] == -1].index,
         aapl[aapl['Signal'] == -1]['short_mvg'],
         'v', markersize = 10, color = 'g', alpha = 0.6)

# Show the plot
plt.show()

Task 5 - Test Your Strategy

Now, here comes the most exciting moments - Simulate Your Strategy. With your trading strategy at hand, it’s always a good idea to test its performance through a simulation. There are a lot of tools for you to do trading simulation. Let's create our own simulator which can generate orders and manages the profit and loss for our portfolio:

First off, you’ll create set a variable initial_capital.

## Task 5.1 Imagine you have 1 million
initial_capital= 1000000

On the days that the signal is 1 and the the short moving average crosses the long moving average (for the period greater than the shortest moving average window), you’ll buy a 5k shares. The days on which the signal is -1, you will sell a 5k shares. When the signal is 0, do nothing.

# Task 5.2 Buy or Sell a 5k shares on the days that the signal is 1 or -1
aapl['Order'] = 5000 * aapl['Signal']
# Task 5.3 Calculate the transaction price paying (buy stocks) or receiving (sell stocks)
aapl['Transaction'] = aapl['Order'].multiply(aapl['Adj Close'], axis=0)
# Check data
aapl[aapl['Transaction'] != 0]

Create a new columns to store the number of shares owned , which is the cumulative sum of the Order.

# Task 5.4 Calculate the number in shares owned
aapl['Shares'] = aapl['Order'].cumsum()

Create a new column to stores the value of the shares owned, which is the multiplication of number of shares and the ‘Adj Close’ price.

# Task 5.5 Calculate the value of shares owned
aapl['Holdings'] = aapl['Shares'].multiply(aapl['Adj Close'], axis=0)

Your portfolio also contains a cash column, which is the capital that you still have left to spend. It is calculated by taking your initial_capital and subtracting your cumulative transaction prices (the price that you paid for buying stock or receicing for selling stocks).

# Task 5.6 Calculate the value of cash owned
aapl['Cash'] = initial_capital - (aapl['Transaction']).cumsum()   

You’ll also add a total column, which contains the sum of your cash and the holdings that you own

# Check the data in December of 2006 when buying happens
aapl['2006-12':].head(15)
# Check the data in February of 2008 when selling happens
aapl['2008-02':].head(15)
# Task 5.7 Calculate the total value of your portfolio
aapl['Total'] = aapl['Cash'] + aapl['Holdings']
# Check the data in December of 2006 when buying happens
aapl['2006-12':].head(15)
# Check the data in February of 2008 when selling happens
aapl['2008-02':].head(15)
# Task 5.8 Visualize the portfolio value or over the years
# Create a figure
fig = plt.figure(figsize = (15,5))

ax1 = fig.add_subplot(111, ylabel='Portfolio value')

# Plot the equity curve
aapl['Total'].plot(ax = ax1, linewidth = 1)

ax1.plot(aapl[aapl['Signal'] == 1].index,
         aapl[aapl['Signal'] == 1.0]['Total'],
         '^', markersize = 10, color = 'r', alpha = 0.6)
ax1.plot(aapl[aapl['Signal'] == -1].index,
         aapl[aapl['Signal'] == -1.0]['Total'],
         'v', markersize = 10, color = 'g', alpha = 0.6)
# Show the plot
plt.show()
# Task 5.9 Visualize the portfolio value or over the years

# Resample `aapl` to 12 months, take last observation as value
yearly = aapl['Total'].resample('Y', convention='end').apply(lambda x: x[-1])
print(yearly)
# Task 5.10 Calculate the yearly return
yearly.pct_change()
# Task 5.11 Plot the yearly return
yearly.pct_change().plot(figsize = (15,5))

You can change any variables of this strategy, such as the stock, number of days of the moving average, the number of stocks bought or sold for every signalled day and so on. Simulate and try to maximize your return.

Here is a plot for the simulation of 4 different companies with the same strategy.

Acknowledgements

Many thanks to clarencecastillo for carefully testing this walkthrough and to everybody else in NTU Open Source Society committee for making this happen! 😘😘😘

Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published