Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update finance end-to-end readme for figure position #3046

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions examples/advanced/finance-end-to-end/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ For this site, we will have three files.
/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/test.csv
/tmp/dataset/horizontal_credit_fraud_data/ZHSZUS33_Bank_1/train.csv
```

![split_data](./figures/split_data.png)

The python code for data generation is located at [prepare_data.py](./utils/prepare_data.py)
Expand Down Expand Up @@ -70,7 +71,8 @@ The data enrichment process involves the following steps:
3. **Repeating for Beneficiary BIC**: Perform the same process for Beneficiary_BIC to generate another feature called x3_y2.
4. **Merging Features**: Merge the two enriched features based on Time and Beneficiary_BIC.

The resulting Dataset looks like this.
The resulting Dataset looks like this.

![enrich_data](./figures/enrichment.png)

We save the enriched data into new csv files.
Expand Down Expand Up @@ -208,6 +210,7 @@ Since each site consists of the same Sender_BIC, to define the graph edge, we us
2. The two transactions time difference are smaller than 6000.

The resulting graph looks like below, essentially an undirected graph with transactions (identified by `UETR`) as nodes and edges connecting two nodes that satisfy the above two rules.

![edge_map](./figures/edge_map.png)

#### Single-site operation example: GNN training and encoding
Expand All @@ -217,6 +220,7 @@ The GNN training procedure is similar to the unsupervised Protein Classification
The results of the GNN training are:
- a GNN model
- the embeddings of the transactions, in this example, they are of dimension 64

![embedding](./figures/embeddings.png)

#### Federated GNN Training and Encoding for All Sites
Expand Down Expand Up @@ -366,8 +370,11 @@ As shown, GNN embeddings help to promote the model performance by providing extr

For model explainability, our XGBoost training code will generate the feature importance plot of the XGBoost model with regard to validation data:
For normalized data without GNN features, the feature importance plot is shown below:

![feature_importance](./figures/shap_beeswarm_base.png)

For normalized data with GNN embeddings, the feature importance plot is shown below:

![feature_importance](./figures/shap_beeswarm_gnn.png)

As shown, the GNN embeddings provide additional features that are important for the model.
As shown, the GNN embeddings provide additional features that are important for the model.
1 change: 1 addition & 0 deletions examples/advanced/finance-end-to-end/xgboost.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,7 @@
"* 'ZNZZAU3M_Bank_8'\n",
"* 'HCBHSGSG_Bank_9'\n",
"* 'XITXUS33_Bank_10' \n",
"\n",
"Total 10 banks\n",
"\n",
"### Prepare Data"
Expand Down
Loading