This project predicts anomalies in cryptocurrency markets using machine learning and time-series analysis. It identifies unusual price movements and deviations from normal patterns, offering real-time detection of potential market anomalies for further analysis.
- Code Directory
- Dataset Acquisition
- Dataset Construction
- Technical Indicators Integration
- Data Visualization
Crypto-Anomaly-Detection
├── conda # All the conda environments
├── data # All the data
├── images # All the images
├── notebooks # All the notebooks
├── src # All the scripts for the analysis
├── .gitignore
├── README.md
└── requirements.txt
The first step to exploit our work is to select a dataset that best suits our needs. The decision was about crypocurrencies due to their higher volatilty with respect to stock market.
As we’re going to see in the next sections, we will need to use some technical analysis indicators which are commonly used by traders to decide whether to sell or buy an asset. Traders use two strategies:
- The first is the use of these technical indicators to determine through graphs of the price possible situation where the asset they’re examining is oversold or overbought;
- The other strategy that traders use is fundamental analysis, this strategy is used mostly with stocks prices and aims at analyzing the main characteristics of a firm, for instance, the revenue, total debt, price/earnings ratio, etc.
It is behind the scope of this analysis to investigate deeply all those analysis indicator.
The data were acquired from Yahoo Finance using the yfinance python package. To get as much sample as possible the price variation of 1 hour has been considered.
Below the features extracted, from each single crypto, in this data acquisition steps:
- Date
- Open price for the hour
- Close price for that hour
- The Higest price in the hour
- The Lowest price in the hour
- The volume of crypto traded in the hour
After all the cryptos' information are acquired a data cleaning pahse is performed, replacing all the missing values with the previous available value.
Below an example of how execute the data_acquisition.py script, additional information can be found looking at the first row of the script.
python src/dataset_acquisition.py --tickers BTC-USD BTS-USD DGB-USD XMR-USD DASH-USD DOGE-USD ETH-USD LTC-USD MAID-USD MONA-USD NAV-USD VTC-USD XCP-USD XRP-USD SYS-USD XLM-USD --period ytd --interval 1h --output_folder data/raw
After acquiring and cleaning the data, the next step is to construct the dataset by labeling anomalies based on price variations.
An anomaly is defined based on the percentage change in the close price of each hour:
- Upward Anomaly: If the price variation is greater than 1%, we label the previous hour as an upward anomaly.
- Downward Anomaly: If the price variation is less than -1%, we label the previous hour as a downward anomaly.
- Stable: If the price variation is between -1% and 1%, we label it as stable.
This labeling helps in identifying significant price movements in the market.
Labeling only the hour before a price variation as anomalous wasn't sufficient for our analysis. To improve the results, we applied a technique called curve shifting.
Curve Shifting involves labeling the previous n hours preceding any anomaly. In our case, we chose a curve shifting of 4 hours. This means that the 4 hours leading up to an anomaly are also labeled as anomalies.
This approach accounts for patterns or signals that may occur before significant price movements, allowing our models to learn from the lead-up to anomalies.
To visualize the impact of curve shifting on the class distribution, we plotted the number of observations for each class before and after applying curve shifting.
The image below shows the trends for the 3 classes (stable, upward and downward anomaly) in the considered time period before the application of the curve shifting:
The image below shows the trends for the 3 classes (stable, upward and downward anomaly) in the considered time period after the application of the curve shifting:
These plots illustrate that, after applying curve shifting, the dataset becomes more balanced, with more samples labeled as anomalies. This helps in training machine learning models more effectively by providing sufficient examples of each class.
Below is an example of how to execute the dataset_construction.py
script from the root folder, which processes the raw data to label anomalies and apply curve shifting.
python src/dataset_construction.py --input_folder data/raw --output_folder data/processed --threshold 1.0 --shift_hours 4
- --input_folder: Path to the folder containing the raw CSV files.
- --output_folder: Path where the processed CSV files will be saved.
- --threshold: Percentage threshold for anomaly detection (e.g., 1.0 for 1% price variation).
- --shift_hours: Number of hours to shift the anomaly labels backward (e.g., 4).
After constructing and labeling our dataset, we enhance it by integrating various technical analysis indicators commonly used in trading. These indicators help capture market trends and momentum, providing additional features for our anomaly detection models.
We created a Python script integrate_indicators.py
that processes each cryptocurrency CSV file and calculates technical indicators using the pandas_ta library.
Calculated Indicators
- Simple Moving Average (SMA): Periods of 5, 12, 13, 14, 20, 21, 26, 30, 50, 100, 200
- Exponential Moving Average (EMA): Same periods as SMA
- Moving Average Convergence Divergence (MACD)
- Relative Strength Index (RSI): Same periods as SMA
- Momentum (MOM)
- Chande Momentum Oscillator (CMO): Same periods as SMA
- Ultimate Oscillator (UO)
- Bollinger Bands (BBANDS)
- Volume Weighted Average Price (VWAP)
python src/integrate_indicators.py --input_folder data/processed --output_folder data/with_indicators
- --input_folder: Path to the folder containing the processed CSV files.
- --output_folder: Path where the updated CSV files with technical indicators will be saved.
Due to the nature of technical indicators, some initial rows in the dataset may have missing values (NaN
) because the indicators require a certain number of periods to calculate their initial values. To handle these missing values, we drop the initial rows where the indicators cannot be computed.
To gain insights from the integrated technical indicators, we plotted several indicators alongside the Bitcoin price to understand their behavior and potential signals.
We plotted the Momentum (MOM) indicator alongside the Bitcoin close price to observe how momentum correlates with price movements.
Momentum (MOM): Positive values indicate upward momentum, while negative values indicate downward momentum. Divergences between MOM and price can signal potential reversals.
We plotted the Chande Momentum Oscillator (CMO) with a period of 14 alongside the Bitcoin close price to identify overbought and oversold conditions.
Chande Momentum Oscillator (CMO): Values above +50 may indicate overbought conditions; values below -50 may indicate oversold conditions.
We plotted the Ultimate Oscillator (UO) alongside the Bitcoin close price to detect potential trend reversals based on multiple timeframes.
Ultimate Oscillator (UO): Values above 70 suggest overbought conditions; values below 30 suggest oversold conditions. Divergences can signal trend reversals.
After integrating technical indicators, the next step is to transform and normalize the dataset to prepare it for machine learning algorithms, particularly deep learning models.
-
Compute Percent Variations: For each feature (excluding certain columns), compute the percent variation of every row with respect to the previous one and add these as new features.
-
Normalize the Dataset: Use Robust Scaling to normalize both the original features and the percent variation features.
We created a Python script data_transformation.py
to automate this process.
-
Compute Percent Variations: The script computes the percent variation for each feature with respect to the previous row and adds these as new columns with the suffix
_pct_change
. -
Normalize Data Using Robust Scaling: Both the original features and the percent variation features are normalized using the
RobustScaler
fromscikit-learn
.
python src/data_transformation.py --input_folder data/with_indicators --output_folder data/transformed
- -- input_folder: Path to the folder containing the CSV files with technical indicators.
- -- output_folder: Path where the transformed CSV files will be saved.
- Robust to Outliers: Cryptocurrency data often contains outliers due to high volatility. Robust Scaling uses the median and interquartile range, reducing the influence of outliers.
- Preservation of Anomalies: Anomalies remain detectable after scaling, which is essential for anomaly detection models.
- No Assumptions about Data Distribution: Unlike methods like z-score normalization, Robust Scaling does not assume a normal distribution.
If you want to contribute to this project, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the GNU GENERAL PUBLIC LICENSE - see the LICENSE
file for details.