Experiments on using Kolmogorov-Arnold Networks (KAN) on Graph Learning

This repository contains some quick experimental results on comparing the performance of MLP, GNN (GCN), KAN, and KAN+GNN on several benchmark datasets on graph learning (specifically, node classfication).

TL;DR (for now)

Using KANs or KAN + GNNs usually introduces a lot of model parameters. This makes it really skeptical to use KANs or KAN+GNNs compared to MLPs or GNNs. (Perhaps we need a more effective way to merge KANs with GNNs)
Make the model (especially the KAN part) as light as possible.
KAN+GNN generally performs great on homophilic datasets, but really suffers on heterophilic datasets (even worse than GCNs).
KANs shines more on heterophilic datasets.
Learning rate is the most important hyperparameter for KANs and KAN+GNNs.

KAN and KAN+GNN (with reference to the original repo)

To build KAN and KAN+GNN, I have used the implementation of Efficient-KAN for all KAN and KAN+GNN experiments. For KAN+GNN, I have combined the Efficient-KAN with GraphKAN, which defines each KAN+GNN layer with (KAN $\rightarrow$ torch.sparse.spmm with the adjacency matrix). The detailed settings are all set as default unless mentinoed explicitly. The utility functions including data splits are also from GraphKAN. (I do not claim any ownership of the Efficient-KAN and GraphKAN code.)

Datasets

The following datasets are used in the experiments:

Cora
Citeseer
Pubmed
Cornell
Texas
Wisconsin

Note that Cora, Citeseer, and Pubmed are homophilic, while Cornell, Texas, and Wisconsin are heterophilic datasets.

Hyperparameter tuning

The following hyperparameters are tuned for each model. For all cases, the maximum number of epochs is set to 1000 except for GNNs. For KAN and KAN+GNN, I have also considered the option of projecting the input features to the hidden dimension as the first step

MLP

Hidden dim: [16, 32, 64]
Num. layers: [1, 2, 3]
Learning rate: [0.01, 0.001, 0.0001]

KAN

Hidden dim: [16, 32, 64]
Num. layers: [1, 2]
Project with MLP to hidden dim as the first step (Proj): [True, False]
Learning rate: [0.1, 0.01, 0.001, 0.0001]

GNN

Architecture: GCN
Hidden dim: [16, 32, 64]
Num. layers: [1, 2, 3]
Learning rate: [0.1, 0.01, 0.001, 0.0001]

KAN+GNN

Hidden dim: [16, 32, 64]
Num. layers for KAN in each layer: [1, 2]
Num. layers for message passing (spmm) in each layer: [1, 2, 3]
Project with MLP to hidden dim as the first step (Proj): [True, False]
Learning rate: [0.1, 0.01, 0.001, 0.0001]

Result 1: Best performers

Results after hyperparameter tuning for different datasets.

KAN+GNN generally performs great on homophilic datasets, but really suffers on heterophilic datasets (even worse than GCNs).
KANs shines more on heterophilic datasets.
Using KANs or KAN + GNNs usually introduces a lot of model parameters. This makes really skeptical to use KANs or KAN+GNNs compared to MLPs or GNNs.

Cora

Model	Validation accuracy	Test accuracy	Number of parameters	Best epoch	Hidden dim	Num. layers	Learning rate
MLP	0.712177	0.737274	10,038	2	16	1	0.1
KAN	0.804428	0.760263	921,600 (`Proj`=`False`)	84	64	2	0.001
GCN	0.889299	0.866995	95,936	18	64	2	0.1
KAN+GNN	0.907749	0.875205	458,560 (`Proj`=`False`)	105	32	1 (KAN) / 1 (spmm)	0.1

Citeseer

Model	Validation accuracy	Test accuracy	Number of parameters	Best epoch	Hidden dim	Num. layers	Learning rate
MLP	0.760902	0.723056	22,224	3	16	1	0.1
KAN	0.801504	0.757162	593,440 (`Proj`=`False`)	65	16	2	0.01
GCN	0.831579	0.815825	119,584	38	32	2	0.01
KAN+GNN	0.831579	0.809004	458,560 (`Proj`=`False`)	104	64	1 (KAN) / 1 (spmm)	0.1

Pubmed

Model	Validation accuracy	Test accuracy	Number of parameters	Best epoch	Hidden dim	Num. layers	Learning rate
MLP	0.890439	0.885932	36,675	80	64	3	0.001
KAN	0.884098	0.881115	80,480 (`Proj`=`False`)	319	16	2	0.01
GCN	0.887649	0.864639	8,560	191	16	3	0.1
KAN+GNN	0.906416	0.905703	80,480 (`Proj`=`False`)	330	16	1 (KAN) / 2 (spmm)	0.01

Cornell

Model	Validation accuracy	Test accuracy	Number of parameters	Best epoch	Hidden dim	Num. layers	Learning rate
MLP	0.918919	0.914894	27,381	37	16	2	0.001
KAN	0.972973	0.829787	1,093,120 (`Proj`=`False`)	46	64	2	0.001
GCN	0.810811	0.723404	27,536	5	16	2	0.1
KAN+GNN	0.891892	0.617021	275,840 (`Proj`=`False`)	78	16	1 (KAN) / 3 (spmm)	0.001

Wisconsin

Model	Validation accuracy	Test accuracy	Number of parameters	Best epoch	Hidden dim	Num. layers	Learning rate
MLP	0.98	0.9125	109,509	4	64	2	0.1
KAN	0.98	0.9125	546,560 (`Proj`=`False`)	39	32	2	0.01
GCN	0.84	0.6125	55,584	3	32	2	0.1
KAN+GNN	0.82	0.65	32,368 (`Proj`=`True`)	148	16	2	0.001

Texas

Model	Validation accuracy	Test accuracy	Number of parameters	Best epoch	Hidden dim	Num. layers	Learning rate
MLP	0.972973	0.852459	54,757	48	32	2	0.01
KAN	1.0	0.704918	1,093,120 (`Proj`=`False`)	23	64	2	0.01
GCN	0.918919	0.754098	55,584	25	32	2	0.0001
KAN+GNN	0.918919	0.737705	74,976 (`Proj`=`True`)	1	32	2	0.1

Result 2 (SHAP analysis): Rule of thumb on hyperparameter settings

For this, I fit an XGBoost model to predict the test performance of each model based on the hyperparameters. Then, I have used the SHAP values to get the 'imporatnce' of each hyperparameter. Some trends are:

KAN+GNN

Learning rate is the hyperparameter to tune if you want the most bang for the buck.
The number of KAN per layers is more important than the number of message passing layers. In general, make the model as light as possible.

Figure: SHAP analysis for Cora on KAN + GNN

KAN

Similar to KAN+GNN, learning rate is the most important hyperparameter.
Also similar to KAN+GNN, make the KAN as light as possible.

Figure: SHAP analysis for Citeseer on KAN

Result 3: Test performance vs. Number of parameters

I have also plotted the test performance vs. the number of parameters for all cases during hyperparameter tuning. Some notes on the figure:

I have used the log scale for the x-axis (number of parameters) to make the plot more readable.
During tuning, there may be some cases where there may have multiple models with the same number of parameters. In such cases, I have highlighted the best performer with the most non-transparent color.

Here are some observations:

In general, it is very easy to build a heavy model using KANs or KAN+GNNs.
For homophilic datasets, introduce GNNs to the mix. The performance usually depends on the specific dataset.
For heterophilic datasets, non-GNN types (MLP, KAN) usually perform better with a larger margin.

Figure: Test performance vs. Number of parameters for Cora

Figure: Test performance vs. Number of parameters for Wisconsin

Note

This is an ongoing investigation, and some results may change in the future. Thanks to all the authors of the Efficient-KAN and GraphKAN repositories for their awesome work!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
src/efficient_kan		src/efficient_kan
README.md		README.md
main_tuning.py		main_tuning.py
models.py		models.py
result.ipynb		result.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiments on using Kolmogorov-Arnold Networks (KAN) on Graph Learning

TL;DR (for now)

KAN and KAN+GNN (with reference to the original repo)

Datasets

Hyperparameter tuning

MLP

KAN

GNN

KAN+GNN

Result 1: Best performers

Cora

Citeseer

Pubmed

Cornell

Wisconsin

Texas

Result 2 (SHAP analysis): Rule of thumb on hyperparameter settings

KAN+GNN

KAN

Result 3: Test performance vs. Number of parameters

Note

About

Releases

Packages

Languages

jordan7186/kangnn-experiment

Folders and files

Latest commit

History

Repository files navigation

Experiments on using Kolmogorov-Arnold Networks (KAN) on Graph Learning

TL;DR (for now)

KAN and KAN+GNN (with reference to the original repo)

Datasets

Hyperparameter tuning

MLP

KAN

GNN

KAN+GNN

Result 1: Best performers

Cora

Citeseer

Pubmed

Cornell

Wisconsin

Texas

Result 2 (SHAP analysis): Rule of thumb on hyperparameter settings

KAN+GNN

KAN

Result 3: Test performance vs. Number of parameters

Note

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages