From 470df076a7091a62b332ce5ecb5f138fde07f39b Mon Sep 17 00:00:00 2001 From: Ziyue Xu Date: Wed, 16 Oct 2024 09:59:43 -0400 Subject: [PATCH 1/2] Update README.md --- examples/advanced/finance-end-to-end/README.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/examples/advanced/finance-end-to-end/README.md b/examples/advanced/finance-end-to-end/README.md index 3e84ec0724..ebd73fdb76 100644 --- a/examples/advanced/finance-end-to-end/README.md +++ b/examples/advanced/finance-end-to-end/README.md @@ -41,6 +41,7 @@ For this site, we will have three files. /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/test.csv /tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/train.csv ``` + ![split_data](./figures/split_data.png) The python code for data generation is located at [prepare_data.py](./utils/prepare_data.py) @@ -70,7 +71,8 @@ The data enrichment process involves the following steps: 3. **Repeating for Beneficiary BIC**: Perform the same process for Beneficiary_BIC to generate another feature called x3_y2. 4. **Merging Features**: Merge the two enriched features based on Time and Beneficiary_BIC. -The resulting Dataset looks like this. +The resulting Dataset looks like this. + ![enrich_data](./figures/enrichment.png) We save the enriched data into new csv files. @@ -208,6 +210,7 @@ Since each site consists of the same Sender_BIC, to define the graph edge, we us 2. The two transactions time difference are smaller than 6000. The resulting graph looks like below, essentially an undirected graph with transactions (identified by `UETR`) as nodes and edges connecting two nodes that satisfy the above two rules. + ![edge_map](./figures/edge_map.png) #### Single-site operation example: GNN training and encoding @@ -217,6 +220,7 @@ The GNN training procedure is similar to the unsupervised Protein Classification The results of the GNN training are: - a GNN model - the embeddings of the transactions, in this example, they are of dimension 64 + ![embedding](./figures/embeddings.png) #### Federated GNN Training and Encoding for All Sites @@ -366,8 +370,11 @@ As shown, GNN embeddings help to promote the model performance by providing extr For model explainability, our XGBoost training code will generate the feature importance plot of the XGBoost model with regard to validation data: For normalized data without GNN features, the feature importance plot is shown below: + ![feature_importance](./figures/shap_beeswarm_base.png) + For normalized data with GNN embeddings, the feature importance plot is shown below: + ![feature_importance](./figures/shap_beeswarm_gnn.png) -As shown, the GNN embeddings provide additional features that are important for the model. \ No newline at end of file +As shown, the GNN embeddings provide additional features that are important for the model. From 57946ac50c20ce753735cb320a75c057ee964ea5 Mon Sep 17 00:00:00 2001 From: Ziyue Xu Date: Wed, 16 Oct 2024 10:02:33 -0400 Subject: [PATCH 2/2] Update xgboost.ipynb --- examples/advanced/finance-end-to-end/xgboost.ipynb | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/advanced/finance-end-to-end/xgboost.ipynb b/examples/advanced/finance-end-to-end/xgboost.ipynb index d740f44fb2..b1e15e6537 100644 --- a/examples/advanced/finance-end-to-end/xgboost.ipynb +++ b/examples/advanced/finance-end-to-end/xgboost.ipynb @@ -260,6 +260,7 @@ "* 'ZNZZAU3M_Bank_8'\n", "* 'HCBHSGSG_Bank_9'\n", "* 'XITXUS33_Bank_10' \n", + "\n", "Total 10 banks\n", "\n", "### Prepare Data"