Graph Convolutional Layers - Builtin #1941

MaximilianSchreff · 2023-11-04T07:14:38Z

Adding a built in graph convolutional layer to our nn layer diretory. The graph convolutional layer follows the paper "Semi-Supervised Classification with Graph Convolutional Networks" from Kipf and Welling. This includes the forward and the backward pass of the layer and an example network on how to use the layers.

We tested the implementation against PyTorch and our implementation computes the same exact values as PyTorch does and handles missing in-bound edges of a node the same way in the spectral convolution. This is also implemented as a component test as a NNComponentTest. There we hard-coded the initialized weights from PyTorch and the result into the component test.

…conv

MaximilianSchreff · 2023-11-04T07:22:28Z

About the performance: My machine showed performance issues when testing against PyTorch for very very big inputs.

Stress test: SystemDS: 340 seconds - PyTorch: 32 seconds

The stress test consisted of about 300 forward passes with about 10.000 x 10.000 matrices. This is likely a problem with my setup and not my implementation since the affine layer with the same inputs took 220 seconds. The GCL consists of a simple affine part and a convolutional part with the convolutional part being a lot more complex. So, the implementation is likely quite fast because the complex convolution part makes up less than a third of the runtime.

MaximilianSchreff · 2023-11-04T07:26:24Z

About caching:

It would be a possibility to cache the normalized weights since they are quite complex to compute (counting of every degree and spectral convolution). In the stress testing, this only showed an improvement of 10 seconds from 340 seconds. Also, when testing the caching feature against smaller inputs than the huge inputs of the stress test, it was always slower.
With the added complexity and difficulty to handle cached weights and generally more outputs of the layer, I decided against caching.

Baunsgaard · 2023-11-06T19:45:04Z

About the performance: My machine showed performance issues when testing against PyTorch for very very big inputs.

Stress test: SystemDS: 340 seconds - PyTorch: 32 seconds

The stress test consisted of about 300 forward passes with about 10.000 x 10.000 matrices. This is likely a problem with my setup and not my implementation since the affine layer with the same inputs took 220 seconds. The GCL consists of a simple affine part and a convolutional part with the convolutional part being a lot more complex. So, the implementation is likely quite fast because the complex convolution part makes up less than a third of the runtime.

ups, this was graph Conv - i thought i was commenting on the ResNet

Baunsgaard · 2023-11-06T20:00:18Z

About the performance: My machine showed performance issues when testing against PyTorch for very very big inputs.

Stress test: SystemDS: 340 seconds - PyTorch: 32 seconds

The stress test consisted of about 300 forward passes with about 10.000 x 10.000 matrices. This is likely a problem with my setup and not my implementation since the affine layer with the same inputs took 220 seconds. The GCL consists of a simple affine part and a convolutional part with the convolutional part being a lot more complex. So, the implementation is likely quite fast because the complex convolution part makes up less than a third of the runtime.

can you show the '-stats' output of calling it, to indicate where we are using time.
and maybe we want to profile it using a profiler: https://github.com/async-profiler/async-profiler

If you are in doubt how to use it, i can show you in office.

MaximilianSchreff · 2023-11-06T20:36:22Z

I already profiled it. The convolutional layer consists of the linear layer and a convolutional layer. The linear layer takes up more than 70% of the runtime. This means there are probably some issues in my configs that restrict systemds from being faster since the linear part is the very base line of this layer.

Baunsgaard · 2023-11-06T20:40:34Z

I already profiled it. The convolutional layer consists of the linear layer and a convolutional layer. The linear layer takes up more than 70% of the runtime. This means there are probably some issues in my configs that restrict systemds from being faster since the linear part is the very base line of this layer.

with Linear layer, do you mean a simple fully connected affine layer (aka a matrix multiplication), or something else?

Can you maybe give me an example in code?

MaximilianSchreff · 2023-11-06T20:50:22Z

Yes exactly. There is a matrix multiplication and adding a bias happening. This is what takes up 70% of the performance.
You can see that in scripts/nn/layers/graph_conv.dml in the forward pass function. That are only two lines that use most of the runtime. These two lines cannot be optimized by my layer.

As for the rest of the forward pass, I already took numerous steps to optimize it from, initially, 780 seconds to 340 seconds for the whole layer. This includes:

Merging the different functions into a big one to combine for-loops, improving parallelization gain
Optimizing the addition of self loops
Caching normalization weights (removed it again, since it wasn't faster)

Baunsgaard

I quickly looked through, and am unsure what operation you are referring to.
I assume the starting matrix multiplication is expensive, but that only depends on the input size, and it will always be expensive. Some of the other code contain many as.integer or as.scalar. most of them should not be needed but i do not think they have much impact on performance.

Baunsgaard · 2023-11-06T21:04:10Z

scripts/nn/layers/graph_conv.dml

+        {
+            edge_weight[j, 1] = 0.0
+        }
+        X_out[as.integer(as.scalar(edge_index[j, 2]))] += as.scalar(edge_weight[j, 1]) * X_hat[as.integer(as.scalar(edge_index[j, 1]))]


all these as.integer(as.scalar(...)) should not be needed.

I think I got some issues when not using them. I will take a look at them.

Baunsgaard · 2023-11-06T21:04:25Z

scripts/nn/layers/graph_conv.dml

+    m = nrow(edge_index)
+
+    # transform
+    X_hat = X %*% W


This is the slow operation?

Yes this is the slow thing. But adding the bias in the end takes just as much time.

This matrix multiplication takes around 35% of the runtime while adding the bias in the end also takes around 35%.

in practice adding the bias, should be very fast compared to the matrix multiplication.
When looking at the code, it seems to me that you access indices, rather than adding vectors.
This might be the issue.

All things related to the index is for the convolution part and not the linear part. But, it is actually possible to do the convolution part without any indices at all since the formula for a graph convolutional layer is OUT = D^-1 * A * D^-1 * X * W + b (A: Adjacency matrix n x n, D: degree matrix n x n, X: input n x features, W: weights f_in x f_out). As you can see, to do the convolution without any indices (normalization and message passing), you need to do 3 extra matrix multiplications instead. These matrix multiplications are (in normal use cases -> n >> features) even bigger than the linear part (XW + b) because D and A are bigger matrices than X and W.
So, since XW + b takes 220 seconds (only the matrix multiplaction takes around 110 seconds), not using indices to do the normalization and convolution would take way longer than 340 seconds, likely around 600 seconds.

This is also the reason why famous other libraries (PyTorch, TF) also mainly use an edge list to do the convolution part through index accessing or use a sparse matrix datatype (which is basically also an edge list) in the GCL implementation.

My -Xms -Xmx arguments were 16g, I think.

Also, I called the tests multiple times. They were always very consistent, only changing by -3 to +3 seconds.

what i mean is not from the outside, but inside your script.
or maybe that is what you do already?

Sorry, the stress test file itself would probably help. The stress test is a big forward pass over 3 layers, repeated 100 times.
So, from the outside and the inside.

ah okay, when will you be in office next time, then maybe we can talk about it?

Sure! I will be in the office tomorrow and on friday.

Baunsgaard · 2023-11-06T21:07:45Z

scripts/nn/layers/graph_conv.dml

+    dX = dOut_agg_rev %*% t(W)
+
+    # calculate gradient w.r.t. W (Formula: X^T * A_hat^T * dOut)
+    dW = t(X) %*% dOut_agg_rev


or is it the gradient taking time?

In the stress test, I only used the forward pass.

MaximilianSchreff added 3 commits October 19, 2023 14:26

GCN layer implementation and example network on how to use it

7c00160

New component tests for the graph convolutional layer

59e3c94

Merge branch 'main' of https://github.com/apache/systemds into graph_…

738d93f

…conv

Baunsgaard reviewed Nov 6, 2023

View reviewed changes

j143 added this to the systemds-3.2.0 milestone Dec 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Convolutional Layers - Builtin #1941

Graph Convolutional Layers - Builtin #1941

MaximilianSchreff commented Nov 4, 2023

MaximilianSchreff commented Nov 4, 2023

MaximilianSchreff commented Nov 4, 2023

Baunsgaard commented Nov 6, 2023 •

edited

Loading

Baunsgaard commented Nov 6, 2023

MaximilianSchreff commented Nov 6, 2023

Baunsgaard commented Nov 6, 2023

MaximilianSchreff commented Nov 6, 2023

Baunsgaard left a comment

Baunsgaard Nov 6, 2023

MaximilianSchreff Nov 6, 2023

Baunsgaard Nov 6, 2023

MaximilianSchreff Nov 6, 2023

MaximilianSchreff Nov 6, 2023

Baunsgaard Nov 6, 2023

MaximilianSchreff Nov 7, 2023

MaximilianSchreff Nov 8, 2023

Baunsgaard Nov 8, 2023

MaximilianSchreff Nov 8, 2023

Baunsgaard Nov 8, 2023

MaximilianSchreff Nov 8, 2023

Baunsgaard Nov 6, 2023

MaximilianSchreff Nov 6, 2023

Graph Convolutional Layers - Builtin #1941

Are you sure you want to change the base?

Graph Convolutional Layers - Builtin #1941

Conversation

MaximilianSchreff commented Nov 4, 2023

MaximilianSchreff commented Nov 4, 2023

MaximilianSchreff commented Nov 4, 2023

Baunsgaard commented Nov 6, 2023 • edited Loading

Baunsgaard commented Nov 6, 2023

MaximilianSchreff commented Nov 6, 2023

Baunsgaard commented Nov 6, 2023

MaximilianSchreff commented Nov 6, 2023

Baunsgaard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Baunsgaard commented Nov 6, 2023 •

edited

Loading