Support Tapa HLS backend #269

EthanMeng324 · 2024-11-28T05:18:24Z

Description

This PR supports Tapa HLS (https://github.com/rapidstream-org/rapidstream-tapa/) backend for Allo. This is mainly designed with Allo dataflow programming interface. The original scheduling interface hasn't been tested yet. This backend has a new kernel codegen, host codegen, and makefile codegen. The basic usage is also different from Vitis HLS. Some usages with makefile are as follows.

make csim: A fast software simulation that only relied on kernel.cpp and tapa_host.cpp.

make fast_hw_emu A fast hardware emulation similar to hw_emu, but no longer need to generate .xclbin.

make run TARGET=<hw_emu/hw>: sw_emu is no longer supported with make run, one can just use csim instead.

Examples

To use Tapa HLS for backend, we can simply choose "tapa" as target when building. For example:

@df.region()
def top():
    @df.kernel(mapping=[P0, P1])
    def gemm(A: int32[M, K], B: int32[K, N], C: int32[M, N]):
        ...

mod = df.build(top, target="tapa", mode="csim")

Issues

Due to the GLIBC version incompatibility between our server and Tapa, tapa g++ and tapa compile is currently not runnable in our server. Which means we should generate the Tapa executable and .xo file in a docker container, copy the generated file, and run the actual testing in our server (You can use docker image ethanmeng324/tapa:v3.0). This will be resolved in the future where there will be an alternative choice that implicitly go through this process (launch docker container, generate file, copy result, continue running).
Current codegen for Tapa does not support multi-dimensional array access because some tricky issue with tapa::mmap and tapa::vec_t. Our current solution is to flatten the array access, like changing from a[1][1] to a[1 * 16 + 1], where 16 is the size of dim_0. Because of this issue, input and output array buffer as L3 cache is not supported in Tapa. However, we will change the input and output buffer into stream type in the future, which will solve this problem.

Checklist

PR's title starts with a category (e.g. [Bugfix], [IR], [Builder], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage (It would be better to provide ~2 different test cases to test the robustness of your code)
Code is well-documented

chhzh123 · 2024-11-28T18:53:29Z

Can you attach a simple program and the generated TAPA code as a comment in this PR?

chhzh123

Thanks for contributing! This PR is very comprehensive!

Can you provide instructions on how to build and run TAPA programs from Allo? Based on the description, it can only generate the TAPA C++ file and require users to explicitly invoke docker, right?
Is it possible to reuse the EmitVivadoHLS pass? It seems most of the facilities are the same, but only the function generation logic needs to be changed. Copying the whole implementation may make it hard to maintain afterwards.

EthanMeng324 · 2024-11-29T05:20:34Z

Thanks for contributing! This PR is very comprehensive!

Can you provide instructions on how to build and run TAPA programs from Allo? Based on the description, it can only generate the TAPA C++ file and require users to explicitly invoke docker, right?

Is it possible to reuse the EmitVivadoHLS pass? It seems most of the facilities are the same, but only the function generation logic needs to be changed. Copying the whole implementation may make it hard to maintain afterwards.

Thanks for reviewing! For the questions:

That might be a little complicated for now. For csim and fast_hw_emu, we can just use the docker container, go into the generated top.prj folder, and run make csim or make fast_hw_emu. For hw_emu or hw, we should first run make all TARGET=hw_emu or make all TARGET=hw in the docker container, then copy the generated to the server under top.prj folder, and then run the same command again in the server.
There is actually a lot of small changes in many part, and will be more in the future. I guess if we reuse EmitVivadoHLS there might be a lot of if and else statement which can be pretty messy, like the makefile generation code. I personally think it's better we keep it separate like Intel HLS.

chhzh123 · 2024-11-29T05:28:29Z

There is actually a lot of small changes in many part, and will be more in the future.

Can you be specific about which parts involve many small changes? I thought only the function interfaces are different

EthanMeng324 · 2024-11-29T05:30:17Z

Can you attach a simple program and the generated TAPA code as a comment in this PR?

Take this simple 2 * 2 tiled gemm as an example:

@df.region()
def top():
    @df.kernel(mapping=[P0, P1])
    def gemm(A: float32[M, K], B: float32[K, N], C: float32[M, N]):
        pi, pj = df.get_pid()
        for i in range(pi * Mt, (pi + 1) * Mt):
            for j in range(pj * Nt, (pj + 1) * Nt):
                for k in range(K):
                    C[i, j] += A[i, k] * B[k, j]

The generated Tapa HLS code is as follows:

void gemm_0_0(
  tapa::mmap<float> v0,
  tapa::mmap<float> v1,
  tapa::mmap<float> v2
) {	// L2
  l_S_i_0_i: for (int i = 0; i < 16; i++) {	// L3
    l_S_j_0_j: for (int j = 0; j < 16; j++) {	// L4
      l_S_k_0_k: for (int k = 0; k < 32; k++) {	// L5
        float v6 = v0[((i * 32) + k)];	// L6
        float v7 = v1[((k * 32) + j)];	// L7
        float v8 = v6 * v7;	// L8
        float v9 = v2[((i * 32) + j)];	// L9
        float v10 = v9 + v8;	// L10
        v2[((i * 32) + j)] = v10;	// L11
      }
    }
  }
}

void gemm_0_1(
  tapa::mmap<float> v11,
  tapa::mmap<float> v12,
  tapa::mmap<float> v13
) {	// L17
  l_S_i_0_i1: for (int i1 = 0; i1 < 16; i1++) {	// L18
    l_S_j_0_j1: for (int j1 = 0; j1 < 16; j1++) {	// L19
      int v16 = (j1 + 16);	// L19
      l_S_k_0_k1: for (int k1 = 0; k1 < 32; k1++) {	// L20
        float v18 = v11[((i1 * 32) + k1)];	// L21
        float v19 = v12[((k1 * 32) + v16)];	// L22
        float v20 = v18 * v19;	// L23
        float v21 = v13[((i1 * 32) + v16)];	// L24
        float v22 = v21 + v20;	// L25
        v13[((i1 * 32) + v16)] = v22;	// L26
      }
    }
  }
}

void gemm_1_0(
  tapa::mmap<float> v23,
  tapa::mmap<float> v24,
  tapa::mmap<float> v25
) {	// L32
  l_S_i_0_i2: for (int i2 = 0; i2 < 16; i2++) {	// L33
    int v27 = (i2 + 16);	// L33
    l_S_j_0_j2: for (int j2 = 0; j2 < 16; j2++) {	// L34
      l_S_k_0_k2: for (int k2 = 0; k2 < 32; k2++) {	// L35
        float v30 = v23[((v27 * 32) + k2)];	// L36
        float v31 = v24[((k2 * 32) + j2)];	// L37
        float v32 = v30 * v31;	// L38
        float v33 = v25[((v27 * 32) + j2)];	// L39
        float v34 = v33 + v32;	// L40
        v25[((v27 * 32) + j2)] = v34;	// L41
      }
    }
  }
}

void gemm_1_1(
  tapa::mmap<float> v35,
  tapa::mmap<float> v36,
  tapa::mmap<float> v37
) {	// L47
  l_S_i_0_i3: for (int i3 = 0; i3 < 16; i3++) {	// L48
    int v39 = (i3 + 16);	// L48
    l_S_j_0_j3: for (int j3 = 0; j3 < 16; j3++) {	// L49
      int v41 = (j3 + 16);	// L49
      l_S_k_0_k3: for (int k3 = 0; k3 < 32; k3++) {	// L50
        float v43 = v35[((v39 * 32) + k3)];	// L51
        float v44 = v36[((k3 * 32) + v41)];	// L52
        float v45 = v43 * v44;	// L53
        float v46 = v37[((v39 * 32) + v41)];	// L54
        float v47 = v46 + v45;	// L55
        v37[((v39 * 32) + v41)] = v47;	// L56
      }
    }
  }
}

void top(
  tapa::mmap<float> v48,
  tapa::mmap<float> v49,
  tapa::mmap<float> v50
) {	// L62
  tapa::task()
  .invoke(gemm_0_0, v48, v49, v50)	// L63
  .invoke(gemm_0_1, v48, v49, v50)	// L64
  .invoke(gemm_1_0, v48, v49, v50)	// L65
  .invoke(gemm_1_1, v48, v49, v50);	// L66
}

EthanMeng324 · 2024-11-29T05:35:10Z

There is actually a lot of small changes in many part, and will be more in the future.

Can you be specific about which parts involve many small changes? I thought only the function interfaces are different

Currently, there are getTypeName, emitValue, emitArrayDecl, emitAffineLoad, emitAffineStore, emitCall, emitLoopDirectives, emitFunctionDirectives, emitFunction, and emitModule.

chhzh123 · 2024-11-30T04:35:25Z

I think a better way to do this is to provide a basic class for EmitHLS, and Vivado, Intel, TAPA backends are all inherited from the base class, so only specific functions need to be overloaded instead of creating if-else branches in the same file.

EthanMeng324 · 2024-11-30T05:54:24Z

I think a better way to do this is to provide a basic class for EmitHLS, and Vivado, Intel, TAPA backends are all inherited from the base class, so only specific functions need to be overloaded instead of creating if-else branches in the same file.

That actually makes great sense. Do you want me to include it in this PR?

chhzh123 · 2024-11-30T15:55:12Z

Maybe not in this PR as it requires lots of code change, but I think you can annotate the functions (just using one line of comment) that are different from the Vivado HLS backend.

EthanMeng324 · 2024-11-30T22:25:09Z

Maybe not in this PR as it requires lots of code change, but I think you can annotate the functions (just using one line of comment) that are different from the Vivado HLS backend.

Sure, just updated.

EthanMeng324 added 17 commits November 26, 2024 16:23

resolve conflict

d77f401

MLIR tapa codegen

69c85fc

tapa hls interface

dfd1ca5

tapa host codegen

0514d94

tapa build file

882ae30

tapa makefile codegen

b5e8745

new tapa codegen with new dataflow

ea83984

change make rules and fix type issue

4889517

support 2d array in kernel and host without buffer

6230a6c

modify all make rules

33c7fc7

reverse kernel and host 2d array support

e2cd372

flattened array for affine load and store

024290e

reformat

9532394

reformat

86b41e2

reformat cpp

dc3edad

reformat

11f5db4

resolve pylint

c2dfe5c

chhzh123 reviewed Nov 28, 2024

View reviewed changes

annotate functions to be overloaded

1f28b6d

chhzh123 merged commit 9c48a6a into cornell-zhang:main Dec 3, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Tapa HLS backend #269

Support Tapa HLS backend #269

EthanMeng324 commented Nov 28, 2024

chhzh123 commented Nov 28, 2024

chhzh123 left a comment

EthanMeng324 commented Nov 29, 2024

chhzh123 commented Nov 29, 2024

EthanMeng324 commented Nov 29, 2024

EthanMeng324 commented Nov 29, 2024

chhzh123 commented Nov 30, 2024

EthanMeng324 commented Nov 30, 2024

chhzh123 commented Nov 30, 2024

EthanMeng324 commented Nov 30, 2024

Support Tapa HLS backend #269

Support Tapa HLS backend #269

Conversation

EthanMeng324 commented Nov 28, 2024

Description

Examples

Issues

Checklist

chhzh123 commented Nov 28, 2024

chhzh123 left a comment

Choose a reason for hiding this comment

EthanMeng324 commented Nov 29, 2024

chhzh123 commented Nov 29, 2024

EthanMeng324 commented Nov 29, 2024

EthanMeng324 commented Nov 29, 2024

chhzh123 commented Nov 30, 2024

EthanMeng324 commented Nov 30, 2024

chhzh123 commented Nov 30, 2024

EthanMeng324 commented Nov 30, 2024