diff --git a/demos/betting-behavior-analysis.mdx b/demos/betting-behavior-analysis.mdx new file mode 100644 index 00000000..5a1bbe00 --- /dev/null +++ b/demos/betting-behavior-analysis.mdx @@ -0,0 +1,224 @@ +--- +title: "User betting behavior analysis" +description: "Identify high-risk and high-value users by analyzing and identifying trends in user betting patterns." +--- + +## Overview + +Betting platforms, sports analysts, and market regulators benefit from analyzing and interpreting users' betting patterns. +For sports analysts, this data helps gauge fan sentiment and engagement, allowing them to identify high-profile events and fine-tune their marketing strategies. +Regulators, on the other hand, focus on ensuring fair play and compliance with gambling laws. They use these insights to prevent illegal activities, such as match-fixing or money laundering. + +During live events, users’ behaviors can shift rapidly in response to gameplay developments. +Processing and analyzing these changes in real-time allows platforms to flag high-risk users, who may be more likely to engage in fraudulent activities. +By joining historic data on user behavior with live betting data, platforms can easily identify high-risk users for further investigation to mitigate potential risks. + +In this tutorial, you will learn how to analyze users’ betting behaviors by integrating historical datasets with live data streams. + +## Prerequisites + +* Ensure that the [PostgreSQL](https://www.postgresql.org/docs/current/app-psql.html) interactive terminal, `psql`, is installed in your environment. For detailed instructions, see [Download PostgreSQL](https://www.postgresql.org/download/). +* Install and run RisingWave. For detailed instructions on how to quickly get started, see the [Quick start](/get-started/quickstart/) guide. +* Ensure that a Python environment is set up and install the [psycopg2](https://pypi.org/project/psycopg2/) library. + +## Step 1: Set up the data source tables + +Once RisingWave is installed and deployed, run the three SQL queries below to set up the tables. You will insert data into these tables to simulate live data streams. + +1. The table `user_profiles` table contains static information about each user. + + ```sql + CREATE TABLE user_profiles ( + user_id INT, + username VARCHAR, + preferred_league VARCHAR, + avg_bet_size FLOAT, + risk_tolerance VARCHAR + ); + ``` + +2. The `betting_history` table contains historical betting records for each user. + + ```sql + CREATE TABLE betting_history ( + user_id INT, + position_id INT, + bet_amount FLOAT, + result VARCHAR, + profit_loss FLOAT, + timestamp TIMESTAMP + ); + ``` + +3. The `positions` has real-time updates for ongoing betting positions for each user. + + ```sql + CREATE TABLE positions ( + position_id INT, + position_name VARCHAR, + user_id INT, + league VARCHAR, + stake_amount FLOAT, + expected_return FLOAT, + current_odds FLOAT, + profit_loss FLOAT, + timestamp TIMESTAMP + ); + ``` + +## Step 2: Run the data generator + +To keep this demo simple, a Python script is used to generate and insert data into the tables created above. + +Clone the [awesome-stream-processing](https://github.com/risingwavelabs/awesome-stream-processing) repository. + +```bash +git clone https://github.com/risingwavelabs/awesome-stream-processing.git +``` + +Navigate to the [user_betting_behavior](https://github.com/risingwavelabs/awesome-stream-processing/tree/main/02-simple-demos/sports_betting/user_betting_behavior) folder. + +```bash +cd awesome-stream-processing/tree/main/02-simple-demos/sports_betting/user_betting_behavior +``` + +Run the `data_generator.py` file. This Python script utilizes the `psycopg2` library to establish a connection with RisingWave so you can generate and insert synthetic data into the tables `positions` and `market_data`. + +If you are not running RisingWave locally or using default credentials, update the connection parameters accordingly: + +```python +default_params = { + "dbname": "dev", + "user": "root", + "password": "", + "host": "localhost", + "port": "4566" +} +``` + +## Step 3: Create materialized views + +In this demo, you will create multiple materialized views to understand bettors' behavior trends. + +Materialized views contain the results of a view expression and are stored in the RisingWave database. The results of a materialized view are computed incrementally and updated whenever new events arrive and do not require to be refreshed. When you query from a materialized view, it will return the most up-to-date computation results. + +### Identify user betting patterns + +The `user_betting_patterns` materialized view provides an overview of each user's betting history, including their win/loss count and average profit. + +```sql +CREATE MATERIALIZED VIEW user_betting_patterns AS +SELECT + user_id, + COUNT(*) AS total_bets, + SUM(CASE WHEN result = 'Win' THEN 1 ELSE 0 END) AS wins, + SUM(CASE WHEN result = 'Loss' THEN 1 ELSE 0 END) AS losses, + AVG(profit_loss) AS avg_profit_loss, + SUM(profit_loss) AS total_profit_loss +FROM + betting_history +GROUP BY + user_id; +``` + +You can query from `user_betting_patterns` to see the results. + +```sql +SELECT * FROM user_betting_patterns LIMIT 5; +``` + +``` + user_id | total_bets | wins | losses | avg_profit_loss | total_profit_loss +---------+------------+------+--------+---------------------+--------------------- + 6 | 4 | 3 | 1 | 52.34777393817115 | 209.3910957526846 + 4 | 4 | 3 | 1 | 68.4942119081947 | 273.9768476327788 + 2 | 4 | 0 | 4 | -123.37575449330379 | -493.50301797321515 + 9 | 4 | 4 | 0 | 188.86010650028302 | 755.4404260011321 + 3 | 4 | 1 | 3 | -54.06198104612867 | -216.2479241845147 +``` + +### Summarize users' exposure + +The `real_time_user_exposure` materialized view sums up the stake amounts of active positions for each user to track each user's current total exposure in real-time. + +With this materialized view, you can filter for users who may be overexposed. + +```sql +CREATE MATERIALIZED VIEW real_time_user_exposure AS +SELECT + user_id, + SUM(stake_amount) AS total_exposure, + COUNT(*) AS active_positions +FROM + positions +GROUP BY + user_id; +``` + +You can query from `real_time_user_exposure` to see the results. + +```sql +SELECT * FROM real_time_user_exposure LIMIT 5; +``` +``` + user_id | total_exposure | active_positions +---------+--------------------+------------------ + 5 | 3784.6700000000005 | 12 + 1 | 3779.05 | 12 + 10 | 2818.66 | 12 + 4 | 3275.99 | 12 + 2 | 3220.93 | 12 +``` + +### Flag high-risk users + +The `high_risk_users` materialized view identifies high-risk users by analyzing their risk tolerance, exposure, and profit patterns. + +A user is considered high-risk if they meet all of the following criteria: +* The total exposure is five times greater than their average bet size. You can customize this threshold to be lower or higher. +* Their average profit loss is less than zero. + +Some users who are not initially categorized as high-risk may exhibit behaviors that indicate they are high-risk users. + +```sql +CREATE MATERIALIZED VIEW high_risk_users AS +SELECT + u.user_id, + u.username, + u.risk_tolerance, + p.total_exposure, + b.total_bets, + b.avg_profit_loss, + b.total_profit_loss +FROM + user_profiles AS u +JOIN + real_time_user_exposure AS p +ON + u.user_id = p.user_id +JOIN + user_betting_patterns AS b +ON + u.user_id = b.user_id +WHERE + p.total_exposure > u.avg_bet_size * 5 + AND b.avg_profit_loss < 0; +``` + +You can query from `high_risk_users` to see the results. + +```sql +SELECT * FROM high_risk_users; +``` +``` + user_id | username | risk_tolerance | total_exposure | total_bets | avg_profit_loss | total_profit_loss +---------+----------+----------------+--------------------+------------+---------------------+--------------------- + 2 | user_2 | Low | 23341.270000000004 | 81 | -2.8318496459258133 | -229.37982131999087 +``` + +When finished, press `Ctrl+C` to close the connection between RisingWave and `psycopg2`. + +## Summary + +In this tutorial, you learn: +- How to perform a multi-way join. diff --git a/demos/inventory-management-forecast.mdx b/demos/inventory-management-forecast.mdx new file mode 100644 index 00000000..df70a0b1 --- /dev/null +++ b/demos/inventory-management-forecast.mdx @@ -0,0 +1,205 @@ +--- +title: "Inventory management and demand forecast" +description: "Track inventory levels and forecast demand to prevent shortages and optimize restocking schedules." +--- + +## Overview + +In fast-moving industries, monitoring inventory levels in real-time is essential to ensuring smooth and successful operations. There are countless factors that affect the supply chain: customer preferences shift, raw materials may suddenly become hard to obtain, and unforeseen circumstances can delay shipments. + +Having a live view of stock levels allows companies to respond immediately to changes in demand and supply chain disruptions. With data constantly streamed in, businesses can adjust forecasts based on current sales trends. If delays occur, notifications can be promptly sent to customers, improving transparency higher customer satisfaction. + +In this tutorial, you will learn how to utilize inventory and sales data to prevent stock shortages and forecast sales demand. + +## Prerequisites + +* Ensure that the [PostgreSQL](https://www.postgresql.org/docs/current/app-psql.html) interactive terminal, `psql`, is installed in your environment. For detailed instructions, see [Download PostgreSQL](https://www.postgresql.org/download/). +* Install and run RisingWave. For detailed instructions on how to quickly get started, see the [Quick start](/get-started/quickstart/) guide. +* Ensure that a Python environment is set up and install the [psycopg2](https://pypi.org/project/psycopg2/) library. + +## Step 1: Set up the data source tables + +Once RisingWave is installed and deployed, run the two SQL queries below to set up the tables. You will insert data into these tables to simulate live data streams. + +1. The table `inventory` tracks the current stock levels of each product at each warehouse. + + ```sql + CREATE TABLE inventory ( + warehouse_id INT, + product_id INT, + timestamp TIMESTAMP, + stock_level INT, + reorder_point INT, + location VARCHAR + ); + ``` +2. The table `sales` describes the details of each transaction, such as the quantity purchased and the warehouse from which the item was sourced. + + ```sql + CREATE TABLE sales ( + sale_id INT, + warehouse_id INT, + product_id INT, + quantity_sold INT, + timestamp TIMESTAMP + ); + ``` + +## Step 2: Run the data generator + +To keep this demo simple, a Python script is used to generate and insert data into the tables created above. + +Clone the [awesome-stream-processing](https://github.com/risingwavelabs/awesome-stream-processing) repository. + +```bash +git clone https://github.com/risingwavelabs/awesome-stream-processing.git +``` + +Navigate to the [warehouse_inventory_mgmt](https://github.com/risingwavelabs/awesome-stream-processing/tree/main/02-simple-demos/logistics/warehouse_inventory_mgmt) folder. + +```bash +cd awesome-stream-processing/tree/main/02-simple-demos/logistics/warehouse_inventory_mgmt +``` + +Run the `data_generator.py` file. This Python script utilizes the `psycopg2` library to establish a connection with RisingWave so you can generate and insert synthetic data into the tables `positions` and `market_data`. + +If you are not running RisingWave locally or using default credentials, update the connection parameters accordingly: + +```python +default_params = { + "dbname": "dev", + "user": "root", + "password": "", + "host": "localhost", + "port": "4566" +} +``` + +## Step 3: Create materialized views + +In this demo, you will create three materialized views to manage inventory levels. + +Materialized views contain the results of a view expression and are stored in the RisingWave database. The results of a materialized view are computed incrementally and updated whenever new events arrive and do not require to be refreshed. When you query from a materialized view, it will return the most up-to-date computation results. + +### Monitor inventory status + +The `inventory_status` materialized view indicates whether or not a product needs to be restocked. + +```sql +CREATE MATERIALIZED VIEW inventory_status AS +SELECT + warehouse_id, + product_id, + stock_level, + reorder_point, + location, + CASE + WHEN stock_level <= reorder_point THEN 'Reorder Needed' + ELSE 'Stock Sufficient' + END AS reorder_status, + timestamp AS last_update +FROM + inventory; +``` + +You can query from `inventory_status` to see the results. + +```sql +SELECT * FROM inventory_status LIMIT 5; +``` + +``` + warehouse_id | product_id | stock_level | reorder_point | location | reorder_status | last_update +--------------+------------+-------------+---------------+-------------+------------------+---------------------------- + 1 | 1 | 64 | 100 | Warehouse 1 | Reorder Needed | 2024-11-18 14:32:35.808553 + 2 | 3 | 137 | 100 | Warehouse 2 | Stock Sufficient | 2024-11-18 14:32:51.023410 + 5 | 10 | 493 | 100 | Warehouse 5 | Stock Sufficient | 2024-11-18 14:32:40.933411 + 4 | 7 | 68 | 100 | Warehouse 4 | Reorder Needed | 2024-11-18 14:32:35.827922 + 1 | 2 | 416 | 100 | Warehouse 1 | Stock Sufficient | 2024-11-18 14:32:45.952925 +``` + +### Aggregate recent sales + +The `recent_sales` materialized view calculates the number of products sold from each warehouse within the past week. By understanding recent sale trends, you can forecast demand. + +A temporal filter, `timestamp > NOW() - INTERVAL '7 days'` is used to retrieve sales made within the past week. To learn more about temporal filters, see [Temporal filters](/processing/sql/temporal-filters/). + +```sql +CREATE MATERIALIZED VIEW recent_sales AS +SELECT + warehouse_id, + product_id, + SUM(quantity_sold) AS total_quantity_sold, + MAX(timestamp) AS last_sale +FROM + sales +WHERE + timestamp > NOW() - INTERVAL '7 days' +GROUP BY + warehouse_id, product_id; +``` + +You can query from `recent_sales` to see the results. + +```sql +SELECT * FROM recent_sales; +``` +``` + warehouse_id | product_id | total_quantity_sold | last_sale +--------------+------------+---------------------+---------------------------- + 2 | 3 | 27 | 2024-11-18 14:33:06.225306 + 2 | 8 | 42 | 2024-11-18 14:33:21.414487 + 3 | 1 | 27 | 2024-11-18 14:33:21.413932 + 3 | 10 | 19 | 2024-11-18 14:33:01.171326 + 4 | 1 | 17 | 2024-11-18 14:33:21.409274 +``` + +### Forecast demand + +The `demand_forecast` materialized view predicts how long the current stock of each product will last based on recent sales trends. + +A simple model is used to forecase demand, where the `stock_level` found in `inventory_status` is divided by the `total_quantity_sold` in `recent_sales. + +RisingWave supports creating materialized views on top of materialized views. When the source materialized view updates, the child materialized view will update accordingly as well. + +```sql +CREATE MATERIALIZED VIEW demand_forecast AS +SELECT + i.warehouse_id, + i.product_id, + i.stock_level, + r.total_quantity_sold AS weekly_sales, + CASE + WHEN r.total_quantity_sold = 0 THEN 0 + ELSE ROUND(i.stock_level / r.total_quantity_sold, 2) + END AS stock_days_remaining +FROM + inventory_status AS i +LEFT JOIN + recent_sales AS r +ON + i.warehouse_id = r.warehouse_id AND i.product_id = r.product_id; +``` + +You can query from `demand_forecast` to see the results. + +```sql +SELECT * FROM demand_forecast LIMIT 5; +``` +``` + warehouse_id | product_id | stock_level | weekly_sales | stock_days_remaining +--------------+------------+-------------+--------------+---------------------- + 2 | 4 | 191 | 28 | 6 + 1 | 7 | 157 | 21 | 7 + 4 | 6 | 171 | 67 | 2 + 3 | 6 | 221 | 86 | 2 + 5 | 4 | 92 | 58 | 1 +``` + +When finished, press `Ctrl+C` to close the connection between RisingWave and `psycopg2`. + +## Summary + +In this tutorial, you learn: +- How to use temporal filters to retrieve data within a specific time range. +- How to create materialized views based on materialized views. \ No newline at end of file diff --git a/demos/market-data-enrichment.mdx b/demos/market-data-enrichment.mdx new file mode 100644 index 00000000..0d3c4834 --- /dev/null +++ b/demos/market-data-enrichment.mdx @@ -0,0 +1,209 @@ +--- +title: "Market data enhancement and transformation" +description: "Transform raw market data in real-time to provide insights into market trends, asset health, and trade opportunities." +--- + +## Overview + +Understanding the fast-moving market is pivotal to making informed trading decisions. The market is constantly influenced by many external factors that are not reflected in the raw market data, such as geopolitical events, economic indicators, and indisutry-specific news. These factors often create rapid fluctuations that are not immediately apparent in raw market data. To make sense of these shifts, traders need external data to better understand the context behinds these price movements. + +Compiling and analyzing this information in real-time is key to understanding the market and making better trading decisions. With real-time data analysis, traders can process and join raw market data and external signals as they happen. By combining these data streams, traders gain a comprehensive, up-to-the-second view of market conditions. This allows them to act quickly and confidently, making quick adjustments to maximize profits and mitigate risks. + +In this tutorial, you will learn how to join market data with external data to gain a holistic view of each asset. + +## Prerequisites + +* Ensure that the [PostgreSQL](https://www.postgresql.org/docs/current/app-psql.html) interactive terminal, `psql`, is installed in your environment. For detailed instructions, see [Download PostgreSQL](https://www.postgresql.org/download/). +* Install and run RisingWave. For detailed instructions on how to quickly get started, see the [Quick start](/get-started/quickstart/) guide. +* Ensure that a Python environment is set up and install the [psycopg2](https://pypi.org/project/psycopg2/) library. + +## Step 1: Set up the data source tables + +Once RisingWave is installed and deployed, run the two SQL queries below to set up the tables. You will insert data into these tables to simulate live data streams. + +1. The table `raw_market_data` tracks raw market data of each asset, such as the price, volume, and bid-ask price. + + ```sql + CREATE TABLE raw_market_data ( + asset_id INT, + timestamp TIMESTAMP, + price FLOAT, + volume INT, + bid_price FLOAT, + ask_price FLOAT + ); + ``` +2. The table `enrichment_data` contains external data that adds context to the raw market data. It includes additional metrics such as historical volatility, sector performance, and sentiment scores. + + ```sql + CREATE TABLE enrichment_data ( + asset_id INT, + sector VARCHAR, + historical_volatility FLOAT, + sector_performance FLOAT, + sentiment_score FLOAT, + timestamp TIMESTAMP + ); + ``` + +## Step 2: Run the data generator + +To keep this demo simple, a Python script is used to generate and insert data into the tables created above. + +Clone the [awesome-stream-processing](https://github.com/risingwavelabs/awesome-stream-processing) repository. + +```bash +git clone https://github.com/risingwavelabs/awesome-stream-processing.git +``` + +Navigate to the [market_data_enrichment](https://github.com/risingwavelabs/awesome-stream-processing/tree/main/02-simple-demos/capital_markets/market_data_enrichment) folder. + +```bash +cd awesome-stream-processing/tree/main/02-simple-demos/capital_markets/market_data_enrichment +``` + +Run the `data_generator.py` file. This Python script utilizes the `psycopg2` library to establish a connection with RisingWave so you can generate and insert synthetic data into the tables `positions` and `market_data`. + +If you are not running RisingWave locally or using default credentials, update the connection parameters accordingly: + +```python +default_params = { + "dbname": "dev", + "user": "root", + "password": "", + "host": "localhost", + "port": "4566" +} +``` + +## Step 3: Create materialized views + +In this demo, you will create three materialized views to better understand the market. + +Materialized views contain the results of a view expression and are stored in the RisingWave database. The results of a materialized view are computed incrementally and updated whenever new events arrive and do not require to be refreshed. When you query from a materialized view, it will return the most up-to-date computation results. + +### Calculate bid-ask spread + +The `avg_price_bid_ask_spread` materialized view calculates the average price and average bid-ask spread for each asset in five-minute time windows by using `TUMBLE()` and grouping by the `assed_id` and the time window. + +To learn more about `TUMBLE()`, see [Time windows](/processing/sql/time-windows/). + +```sql +CREATE MATERIALIZED VIEW avg_price_bid_ask_spread AS +SELECT + asset_id, + ROUND(AVG(price), 2) AS average_price, + ROUND(AVG(ask_price - bid_price), 2) AS bid_ask_spread, + window_end +FROM + TUMBLE(raw_market_data, timestamp, '5 minutes') +GROUP BY asset_id, window_end; +``` + +You can query from `avg_price_bid_ask_spread` to see the results. + +```sql +SELECT * FROM avg_price_bid_ask_spread LIMIT 5; +``` + +``` + asset_id | average_price | bid_ask_spread | window_end +----------+---------------+----------------+--------------------- + 2 | 106.55 | 0.58 | 2024-11-19 16:20:00 + 5 | 98.08 | 0.60 | 2024-11-19 16:25:00 + 1 | 93.39 | 0.61 | 2024-11-19 16:20:00 + 3 | 100.96 | 0.60 | 2024-11-19 16:25:00 + 4 | 99.56 | 0.64 | 2024-11-19 16:20:00 +``` + +### Calculate rolling price volatility + +The `rolling_volatility` materialized view uses the `stddev_samp` function to calculate the rolling price volatility within five-minute time windows by using `TUMBLE()` and grouping by the `assed_id` and the time window. + +```sql +CREATE MATERIALIZED VIEW rolling_volatility2 AS +SELECT + asset_id, + ROUND(stddev_samp(price), 2) AS rolling_volatility, + window_end +FROM + TUMBLE(raw_market_data, timestamp, '5 minutes') + GROUP BY asset_id, window_end; +``` + +You can query from `rolling_volatility` to see the results. + +```sql +SELECT * FROM rolling_volatility2 LIMIT 5; +``` +``` + asset_id | rolling_volatility | window_end +----------+--------------------+--------------------- + 1 | 27.98 | 2024-11-19 16:35:00 + 4 | 29.55 | 2024-11-19 16:35:00 + 5 | 28.78 | 2024-11-19 16:30:00 + 2 | 28.76 | 2024-11-19 16:20:00 + 5 | 27.60 | 2024-11-19 16:25:00 +``` + +### Obtain a comprehensive view of each asset + +The `enriched_market_data` materialized view combines the transformed market data with the enrichment data. `TUMBLE()` is used to group the data from `enrichment_data` into five-minute time windows for each asset. Then it is joined with the volatility and bid-ask spread data. + +By combining these data sources, you can obtain a more holistic view of each asset, empowering you to make more informed market decisions. + +```sql +CREATE MATERIALIZED VIEW enriched_market_data2 AS +SELECT + bas.asset_id, + ed.sector, + bas.average_price, + bas.bid_ask_spread, + rv.rolling_volatility, + ed.avg_historical_volatility, + ed.avg_sector_performance, + ed.avg_sentiment_score, + rv.window_end +FROM + avg_price_bid_ask_spread2 AS bas +JOIN + rolling_volatility2 AS rv +ON + bas.asset_id = rv.asset_id AND + bas.window_end = rv.window_end +JOIN ( + SELECT asset_id, + sector, + window_end, + AVG(historical_volatility) AS avg_historical_volatility, + AVG(sector_performance) AS avg_sector_performance, + AVG(sentiment_score) AS avg_sentiment_score + FROM TUMBLE(enrichment_data, timestamp, '5 minutes') + GROUP BY asset_id, sector, window_end +) AS ed +ON bas.asset_id = ed.asset_id AND + bas.window_end = ed.window_end; +``` + +You can query from `enriched_market_data` to see the results. + +```sql +SELECT * FROM enriched_market_data LIMIT 5; +``` +``` + asset_id | sector | average_price | bid_ask_spread | rolling_volatility | avg_historical_volatility | avg_sector_performance | avg_sentiment_score | window_end +----------+------------+---------------+----------------+--------------------+--------------------------------+---------------------------------+---------------------------------+--------------------- + 1 | Energy | 99.75 | 0.61 | 27.83 | 0.2940625 | -0.00375 | 0.0940625 | 2024-11-19 16:30:00 + 4 | Technology | 100.62 | 0.60 | 30.52 | 0.3102702702702702702702702703 | 0.0045945945945945945945945946 | -0.0683783783783783783783783784 | 2024-11-19 16:30:00 + 5 | Energy | 100.24 | 0.60 | 28.80 | 0.2890697674418604651162790698 | 0.004186046511627906976744186 | 0.1609302325581395348837209302 | 2024-11-19 16:35:00 + 2 | Energy | 106.55 | 0.58 | 28.76 | 0.2922222222222222222222222222 | -0.01 | -0.2955555555555555555555555556 | 2024-11-19 16:20:00 + 3 | Energy | 98.77 | 0.64 | 29.45 | 0.2894594594594594594594594595 | 0.0035135135135135135135135135 | -0.10 | 2024-11-19 16:30:00 +``` + +When finished, press `Ctrl+C` to close the connection between RisingWave and `psycopg2`. + +## Summary + +In this tutorial, you learn: +- How to get time-windowed aggregate results by using the tumble time window function. +- How to join data sources with materialized views. diff --git a/demos/market-trade-surveillance.mdx b/demos/market-trade-surveillance.mdx new file mode 100644 index 00000000..b1c23431 --- /dev/null +++ b/demos/market-trade-surveillance.mdx @@ -0,0 +1,210 @@ +--- +title: "Market and trade activity surveillance" +description: "Detect suspicious patterns, compliance breaches, and anomalies from trading activities in real-time." +--- + +## Overview + +In fast-paced financial markets, regulatory agencies and trading firms are constantly monitoring trades to flag irregular activity. Behaviors like spoofing, where traders place deceptive orders, or sudden large price spikes, are particularly concerning. As trades are happening every second, being able to detect and react instantly to suspicious behavior is crucial to maintain fair and transparent operations. + +By monitoring and analyzing bid-ask spreads, and rolling volumes between assets and trades on the fly, firms can instantly detect potential risks. For example, a tight bid-ask spread with a sudden decrease in rolling volume hints at spoofing, and a sharp price increase within a short time window indicates a spike in volatility. + +In this tutorial, you will learn how to monitor market and trade activities in real time. You will enrich the raw trade and market data with calculated metrics like high trading volume and rapid price fluctuations. + +## Prerequisites + +* Ensure that the [PostgreSQL](https://www.postgresql.org/docs/current/app-psql.html) interactive terminal, `psql`, is installed in your environment. For detailed instructions, see [Download PostgreSQL](https://www.postgresql.org/download/). +* Install and run RisingWave. For detailed instructions on how to quickly get started, see the [Quick start](/get-started/quickstart/) guide. +* Ensure that a Python environment is set up and install the [psycopg2](https://pypi.org/project/psycopg2/) library. + +## Step 1: Set up the data source tables + +Once RisingWave is installed and deployed, run the two SQL queries below to set up the tables. You will insert data into these tables to simulate live data streams. + +1. The table `trade_data` tracks key details about individual trades, such as the buyer, seller, volume, and price of the trade. + + ```sql + CREATE TABLE trade_data ( + trade_id INT, + asset_id INT, + timestamp TIMESTAMP, + price FLOAT, + volume INT, + buyer_id INT, + seller_id INT + ); + ``` + +2. The `market_data` table tracks information related to financial assets, such as the bid price, ask price, and the trading volume over a rolling time period.. + + ```sql + CREATE TABLE market_data ( + asset_id INT, + timestamp TIMESTAMP, + bid_price FLOAT, + ask_price FLOAT, + price FLOAT, + rolling_volume INT + ); + ``` + +## Step 2: Run the data generator + +To keep this demo simple, a Python script is used to generate and insert data into the tables created above. + +Clone the [awesome-stream-processing](https://github.com/risingwavelabs/awesome-stream-processing) repository. + +```bash +git clone https://github.com/risingwavelabs/awesome-stream-processing.git +``` + +Navigate to the [market_surveillance](https://github.com/risingwavelabs/awesome-stream-processing/tree/main/02-simple-demos/capital_markets/market_surveillance) folder. + +```bash +cd awesome-stream-processing/tree/main/02-simple-demos/capital_markets/market_surveillance +``` + +Run the `data_generator.py` file. This Python script utilizes the `psycopg2` library to establish a connection with RisingWave so you can generate and insert synthetic data into the tables `positions` and `market_data`. + +If you are not running RisingWave locally or using default credentials, update the connection parameters accordingly: + +```python +default_params = { + "dbname": "dev", + "user": "root", + "password": "", + "host": "localhost", + "port": "4566" +} +``` + +## Step 3: Create materialized views + +In this demo, you will create multiple materialized views to help analyze market activity and flag suspicious trades. + +Materialized views contain the results of a view expression and are stored in the RisingWave database. The results of a materialized view are computed incrementally and updated whenever new events arrive and do not require to be refreshed. When you query from a materialized view, it will return the most up-to-date computation results. + +### Identify unusual volume trades + +The `unusual_volume` materialized view indicates if a trade has a higher than average trading volume within a ten-minute window. A rolling window is used for each `asset_id` to calculate the average volume. + +If the trade's volume is 1.5 times greater than the rolling average volume over the past ten-minutes, it is marked as an unusual trade. + +```sql +CREATE MATERIALIZED VIEW unusual_volume AS +SELECT + trade_id, + asset_id, + volume, + CASE WHEN volume > AVG(volume) OVER (PARTITION BY asset_id ORDER BY timestamp RANGE INTERVAL '10 MINUTES' PRECEDING) * 1.5 + THEN TRUE ELSE FALSE END AS unusual_volume, + timestamp +FROM + trade_data; +``` + +You can query from `position_overview` to see the results. High volume trades are indicated in the `unusual_volume` column. + +```sql +SELECT * FROM unusual_volume LIMIT 5; +``` + +``` + trade_id | asset_id | volume | unusual_volume | timestamp +----------+----------+--------+----------------+---------------------------- + 46668 | 5 | 318 | f | 2024-11-14 15:36:08.419826 + 52030 | 5 | 301 | f | 2024-11-14 15:36:09.432126 + 22027 | 5 | 375 | f | 2024-11-14 15:36:10.452766 + 98493 | 5 | 673 | t | 2024-11-14 15:36:11.474102 + 93247 | 5 | 929 | t | 2024-11-14 15:36:12.504713 +``` + +### Monitor price spikes + +The `price_spike` materialized view analyzes the price fluctuation of assets within a rolling five-minute window to detect potential price spikes. Calculate the percent change between the highest and lower prices within a five-minute window for each asset. + +A price spike for the asset is detected if the percentage change exceeds 5%. + +```sql +CREATE MATERIALIZED VIEW price_spike AS +SELECT + asset_id, + (MAX(price) OVER (PARTITION BY asset_id ORDER BY timestamp + RANGE INTERVAL '5 MINUTES' PRECEDING) - + MIN(price) OVER (PARTITION BY asset_id ORDER BY timestamp + RANGE INTERVAL '5 MINUTES' PRECEDING)) / + MIN(price) OVER (PARTITION BY asset_id ORDER BY timestamp + RANGE INTERVAL '5 MINUTES' PRECEDING) AS percent_change, + CASE + WHEN + (MAX(price) OVER (PARTITION BY asset_id ORDER BY timestamp + RANGE INTERVAL '5 MINUTES' PRECEDING) - + MIN(price) OVER (PARTITION BY asset_id ORDER BY timestamp + RANGE INTERVAL '5 MINUTES' PRECEDING)) / + MIN(price) OVER (PARTITION BY asset_id ORDER BY timestamp + RANGE INTERVAL '5 MINUTES' PRECEDING) * 100 > 5 + THEN TRUE + ELSE FALSE + END AS if_price_spike, + timestamp +FROM + market_data; +``` + +You can query from `price_spike` to see the results. The `if_price_spike` column denotes if there was a price spike for the asset. + +```sql +SELECT * FROM risk_summary; +``` +``` + asset_id | percent_change | if_price_spike | timestamp +----------+----------------------+----------------+---------------------------- + 5 | 0 | f | 2024-11-14 15:36:08.422258 + 1 | 0.44455600178491755 | t | 2024-11-14 15:36:09.432829 + 2 | 0.3369464944649446 | t | 2024-11-14 15:36:09.433616 + 3 | 1.7954589186068888 | t | 2024-11-14 15:36:09.434359 + 4 | 0.012453174040700659 | f | 2024-11-14 15:36:09.435159 +``` + +### Flag spoofing activity + +The `spoofing_detection` materialized view detects potential spoofing activity by analyzing the bid-ask price difference and the trading volume. + +The following two conditions must be met to flag spoofing activity: + +* The absolute difference between the bid price and ask price is less than 0.2. +* The current rolling volume is less than 80% of the average rolling volume over the past ten minutes. + +```sql +CREATE MATERIALIZED VIEW spoofing_detection AS +SELECT + m.asset_id, + m.timestamp, + CASE WHEN ABS(m.bid_price - m.ask_price) < 0.2 AND rolling_volume < AVG(rolling_volume) OVER (PARTITION BY asset_id ORDER BY timestamp RANGE INTERVAL '10 MINUTES' PRECEDING) * 0.8 + THEN TRUE ELSE FALSE END AS potential_spoofing +FROM + market_data AS m; +``` + +You can query from `spoofing_detection` to see the results. + +```sql +SELECT * FROM market_summary LIMIT 5; +``` +``` + asset_id | timestamp | potential_spoofing +----------+----------------------------+-------------------- + 4 | 2024-11-14 15:36:08.421848 | f + 5 | 2024-11-14 15:36:08.422258 | f + 1 | 2024-11-14 15:36:09.432829 | f + 2 | 2024-11-14 15:36:09.433616 | f + 3 | 2024-11-14 15:36:09.434359 | f +``` + +When finished, press `Ctrl+C` to close the connection between RisingWave and `psycopg2`. + +## Summary + +In this tutorial, you learn: + +- How to create rolling windows by using the `PARTITION BY` clause. \ No newline at end of file diff --git a/demos/overview.mdx b/demos/overview.mdx index 4979859f..e6862074 100644 --- a/demos/overview.mdx +++ b/demos/overview.mdx @@ -4,27 +4,36 @@ description: "Discover the wide range of real-world use cases where RisingWave d mode: wide --- -You can follow along with the instructions provided to easily test the functionalities of RisingWave using Docker. All demos can be found within the [integration\_tests](https://github.com/risingwavelabs/risingwave/tree/main/integration%5Ftests) folder of the RisingWave repository. +You can follow along with the instructions provided to easily test the functionalities of RisingWave using Python and the standalone version of RisingWave. All demos can be found within the [awesome-stream-processing](https://github.com/risingwavelabs/awesome-stream-processing) repository. -Try out the following runnable demos: +Try out the following runnable demos in these different industries: + +## Capital markets - -Use RisingWave to track the click-through rate of ads in real time to gauge their performance and effectiveness. + +Transform raw market data in real-time to provide insights into market trends, asset health, and trade opportunities. + + +Detect suspicious patterns, compliance breaches, and anomalies from trading activities in real-time. - -Detect server anomalies instantly and monitor server performance using materialized views built on top of materialized views. + + +## Sports betting + + + +Manage your sports betting positions in real-time by using RisingWave to monitor exposure and risk. - -Discover and track significant observations from text data using regular expression SQL functions in RisingWave. + +Identify high-risk and high-value users by analyzing and identifying trends in user betting patterns. - -Explore how to use RisingWave to build a real-time streaming pipeline to observe and evaluate clickstream data from webpages. - - -Monitor the engagement and performance of a live stream in real time using a live dashboard in RisingWave. - - -Build an end-to-end monitoring and alerting system using Prometheus, Kafka, RisingWave, and Grafana to monitor RisingWave's performance metrics. - + +## Logistics + + + +Track inventory levels and forecast demand to prevent shortages and optimize restocking schedules. + + \ No newline at end of file diff --git a/demos/sports-risk-profit-analysis.mdx b/demos/sports-risk-profit-analysis.mdx new file mode 100644 index 00000000..64dd7ddf --- /dev/null +++ b/demos/sports-risk-profit-analysis.mdx @@ -0,0 +1,214 @@ +--- +title: "Risk and profit analysis in sports betting" +description: "Manage your sports betting positions in real-time by using RisingWave to monitor exposure and risk." +--- + +## Overview + +Sports betting involves wagering money on the outcome of sports events. Bettors place bets on various aspects of the game, such as the winning team or the point spread. Bookmakers determine and provide details on the odds and payouts, which are continuously updated based on real-time game dynamics and market behavior. + +By continuously monitoring and analyzing real-time market and betting positions data, bookmakers and bettors can make more informed decisions. Bettors can instantly calculate their profits and refine their betting strategies based on their current risk level. Bookmakers can adjust odds based on the market, maintaining profitability. + +In this tutorial, you will learn how to analyze real-time betting and market data to dynamically evaluate the risk, profit, and loss of betting positions. + +## Prerequisites + +* Ensure that the [PostgreSQL](https://www.postgresql.org/docs/current/app-psql.html) interactive terminal, `psql`, is installed in your environment. For detailed instructions, see [Download PostgreSQL](https://www.postgresql.org/download/). +* Install and run RisingWave. For detailed instructions on how to quickly get started, see the [Quick start](/get-started/quickstart/) guide. +* Ensure that a Python environment is set up and install the [psycopg2](https://pypi.org/project/psycopg2/) library. + +## Step 1: Set up the data source tables + +Once RisingWave is installed and deployed, run the two SQL queries below to set up the tables. You will insert data into these tables to simulate live data streams. + +1. The table `positions` tracks key details about each betting position within different sports league. It contains information such as the stake amount, expected return, fair value, and market odds, allowing us to assess the risk and performance of each position. + + ```sql + CREATE TABLE positions ( + position_id INT, + league VARCHAR, + position_name VARCHAR, + timestamp TIMESTAMP, + stake_amount FLOAT, + expected_return FLOAT, + max_risk FLOAT, + fair_value FLOAT, + current_odds FLOAT, + profit_loss FLOAT, + exposure FLOAT + ); + ``` + +2. The table `market_data` describes the market activity related to specific positions. You can track pricing and volume trends across different bookmakers, observing pricing changes over time. + + ```sql + CREATE TABLE market_data ( + position_id INT, + bookmaker VARCHAR, + market_price FLOAT, + volume INT, + timestamp TIMESTAMP + ); + ``` + +## Step 2: Run the data generator + +To keep this demo simple, a Python script is used to generate and insert data into the tables created above. + +Clone the [awesome-stream-processing](https://github.com/risingwavelabs/awesome-stream-processing) repository. + +```bash +git clone https://github.com/risingwavelabs/awesome-stream-processing.git +``` + +Navigate to the [position_risk_management](https://github.com/risingwavelabs/awesome-stream-processing/tree/main/02-simple-demos/sports_betting/position_risk_management) folder. + +```bash +cd awesome-stream-processing/02-simple-demos/sports_betting/position_risk_management +``` + +Run the `data_generator.py` file. This Python script utilizes the `psycopg2` library to establish a connection with RisingWave so you can generate and insert synthetic data into the tables `positions` and `market_data`. + +If you are not running RisingWave locally or using default credentials, update the connection parameters accordingly: + +```python +default_params = { + "dbname": "dev", + "user": "root", + "password": "", + "host": "localhost", + "port": "4566" +} +``` + +## Step 3: Create materialized views + +In this demo, you will create three materialized views to gain insight on individual positions and the market risk. + +Materialized views contain the results of a view expression and are stored in the RisingWave database. The results of a materialized view are computed incrementally and updated whenever new events arrive and do not require to be refreshed. When you query from a materialized view, it will return the most up-to-date computation results. + +### Track individual positions + +The `position_overview` materialized view provides key information on each position, such as the stake, max risk, market price, profit loss, and risk level. It joins the `positions` table with the most recent `market_price` from the `market_data` table. This is done using `ROW_NUMBER()`, which assigns a rank to each record based on `position_id`, ordered by the timestamp in descending order. + +`profit_loss` is calculated as the difference between `market_price` and `fair_value` while `risk_level` is based on `profit_loss` relative to `max_risk`. + +```sql +CREATE MATERIALIZED VIEW position_overview AS +SELECT + p.position_id, + p.position_name, + p.league, + p.stake_amount, + p.max_risk, + p.fair_value, + m.market_price, + (m.market_price - p.fair_value) * p.stake_amount AS profit_loss, + CASE + WHEN (m.market_price - p.fair_value) * p.stake_amount > p.max_risk THEN 'High' + WHEN (m.market_price - p.fair_value) * p.stake_amount BETWEEN p.max_risk * 0.5 AND p.max_risk THEN 'Medium' + ELSE 'Low' + END AS risk_level, + m.timestamp AS last_update +FROM + positions AS p +JOIN + (SELECT position_id, market_price, timestamp, + ROW_NUMBER() OVER (PARTITION BY position_id ORDER BY timestamp DESC) AS row_num + FROM market_data) AS m +ON p.position_id = m.position_id +WHERE m.row_num = 1; +``` + +You can query from `position_overview` to see the results. + +```sql +SELECT * FROM position_overview LIMIT 5; +``` + +``` + position_id | position_name | league | stake_amount | max_risk | fair_value | market_price | profit_loss | risk_level | last_update +-------------+------------------+--------+--------------+----------+------------+--------------+---------------------+------------+---------------------------------- + 9 | Team A vs Team C | NBA | 495.6 | 727.74 | 1.64 | 2.07 | 213.10799999999998 | Low | 2024-11-11 15:46:49.414689+00:00 + 2 | Team B vs Team E | NBA | 82.96 | 113.2 | 2.89 | 4.53 | 136.0544 | High | 2024-11-11 15:46:49.410444+00:00 + 9 | Team E vs Team B | NHL | 121.86 | 158.26 | 3.04 | 2.07 | -118.20420000000003 | Low | 2024-11-11 15:46:49.414689+00:00 + 2 | Team D vs Team B | NBA | 408.89 | 531.91 | 1.98 | 4.53 | 1042.6695 | High | 2024-11-11 15:46:49.410444+00:00 + 9 | Team C vs Team B | NFL | 420.62 | 449.32 | 2.01 | 2.07 | 25.237200000000023 | Low | 2024-11-11 15:46:49.414689+00:00 +``` + +### Monitor overall risk + +The `risk_summary` materialized view gives an overview on the number of positions that are considered "High", "Medium", or "Low" risk. Group by `risk_level` from `position_overview` and count the number of positions in each category. + +This allows us to quickly understand overall risk exposure across all positions. + +```sql +CREATE MATERIALIZED VIEW risk_summary AS +SELECT + risk_level, + COUNT(*) AS position_count +FROM + position_overview +GROUP BY + risk_level; +``` + +You can query from `risk_summary` to see the results. + +```sql +SELECT * FROM risk_summary; +``` +``` + risk_level | position_count +------------+---------------- + High | 23 + Medium | 46 + Low | 341 +``` + +### Retrieve latest market prices + +The `market_summary` materialized view shows the current market data for each betting from the `positions` table. It joins `positions` and `market_data` to include the most recent market price for each bookmaker. Again, `ROW_NUMBER()` is used to retrieve the most recent record for each bookmaker and position. + +```sql +CREATE MATERIALIZED VIEW market_summary AS +SELECT + p.position_id, + p.position_name, + p.league, + m.bookmaker, + m.market_price, + m.timestamp AS last_update +FROM + positions AS p +JOIN + (SELECT position_id, bookmaker, market_price, timestamp, + ROW_NUMBER() OVER (PARTITION BY position_id, bookmaker ORDER BY timestamp DESC) AS row_num + FROM market_data) AS m +ON p.position_id = m.position_id +WHERE m.row_num = 1; +``` + +You can query from `market_summary` to see the results. + +```sql +SELECT * FROM market_summary LIMIT 5; +``` +``` + position_id | position_name | league | bookmaker | market_price | last_update +-------------+------------------+--------+------------+--------------+---------------------------- + 8 | Team D vs Team E | NBA | FanDuel | 2.07 | 2024-11-12 15:03:03.681245 + 3 | Team A vs Team E | MLB | FanDuel | 2.27 | 2024-11-12 15:02:55.525759 + 9 | Team B vs Team E | Tennis | BetMGM | 4.77 | 2024-11-12 15:03:09.833653 + 4 | Team C vs Team B | NHL | Caesars | 1.02 | 2024-11-12 15:03:07.767925 + 3 | Team A vs Team D | NBA | Caesars | 2.21 | 2024-11-12 15:02:45.320730 +``` + +When finished, press `Ctrl+C` to close the connection between RisingWave and `psycopg2`. + +## Summary + +In this tutorial, you learn: + +- How to connect to RisingWave from a Python application using `psycopg2`. +- How to use `ROW_NUMBER()` to retrieve the most recent message based on the timestamp. \ No newline at end of file diff --git a/mint.json b/mint.json index 876b569b..02053f7e 100644 --- a/mint.json +++ b/mint.json @@ -887,12 +887,26 @@ "group": "Demos", "pages": [ "demos/overview", - "demos/real-time-ad-performance-analysis", - "demos/server-performance-anomaly-detection", - "demos/fast-twitter-events-processing", - "demos/clickstream-analysis", - "demos/live-stream-metrics-analysis", - "demos/use-risingwave-to-monitor-risingwave-metrics" + { + "group": "Capital markets", + "pages": [ + "demos/market-data-enrichment", + "demos/market-trade-surveillance" + ] + }, + { + "group": "Sports betting", + "pages": [ + "demos/betting-behavior-analysis", + "demos/sports-risk-profit-analysis" + ] + }, + { + "group": "Logistics", + "pages": [ + "demos/inventory-management-forecast" + ] + } ] },