diff --git a/README.md b/README.md index c9b4d3f63..6f1d9b7c4 100644 --- a/README.md +++ b/README.md @@ -71,9 +71,9 @@ pip install . # with -e option if you want to install in editable mode ### Method 2: Build from Source We currently provide three ways to install **tile-lang** from source: - - [Install from Source (using your own TVM installation)](./docs/Installation.md#install-from-source-with-your-own-tvm-installation) - - [Install from Source (using the bundled TVM submodule)](./docs/Installation.md#install-from-source-with-our-tvm-submodule) - - [Install Using the Provided Script](./docs/Installation.md#install-with-provided-script) + - [Install from Source (using your own TVM installation)](./docs/get_started/Installation.rst#method-1-install-from-source-using-your-own-tvm-installation) + - [Install from Source (using the bundled TVM submodule)](./docs/get_started/Installation.rst#method-2-install-from-source-with-our-tvm-submodule) + - [Install Using the Provided Script](./docs/get_started/Installation.rst##method-3-install-using-the-provided-script) ## Quick Start @@ -192,6 +192,12 @@ In addition to GEMM, we provide a variety of examples to showcase the versatilit TileLang has now been used in project [BitBLAS](https://github.com/microsoft/BitBLAS). +## Join the Discussion + +Welcome to join our Discord community for discussions, support, and collaboration! + +[![Join our Discord](https://img.shields.io/badge/Discord-Join%20Us-blue?logo=discord&style=for-the-badge)](https://discord.gg/TUrHyJnKPG) + ## Acknowledgements We learned a lot from the [TVM](https://github.com/apache/tvm) community and would like to thank them for their contributions. The initial version of this project is mainly contributed by [LeiWang1999](https://github.com/LeiWang1999), [chengyupku](https://github.com/chengyupku) and [nox-410](https://github.com/nox-410). Part of this work was done during the internship at Microsoft Research, under the supervision of Dr. Lingxiao Ma, Dr. Yuqing Xia, Dr. Jilong Xue, and Dr. Fan Yang. diff --git a/deprecated/docs/Installation.md b/deprecated/docs/Installation.md deleted file mode 100644 index 9557274c7..000000000 --- a/deprecated/docs/Installation.md +++ /dev/null @@ -1,166 +0,0 @@ -# Installation Guide - -## Installing with pip - -**Prerequisites for installation via wheel or PyPI:** -- **Operating System**: Ubuntu 20.04 or later -- **Python Version**: >= 3.8 -- **CUDA Version**: >= 11.0 - -The easiest way to install TileLang is directly from the PyPi using pip. To install the latest version, run the following command in your terminal. - -**Note**: Currently, TileLang whl is only supported on Ubuntu 20.04 or later version as we build the whl files on this platform. Currently we only provide whl files for CUDA>=11.0 and with Python>=3.8. **If you are using a different platform or environment, you may need to [build TileLang from source](https://github.com/tile-ai/tilelang/blob/main/docs/Installation.md#building-from-source).** - -```bash -pip install tilelang -``` - -Alternatively, you may choose to install TileLang using prebuilt packages available on the Release Page: - -```bash -pip install tilelang-0.0.0.dev0+ubuntu.20.4.cu120-py3-none-any.whl -``` - -To install the latest version of TileLang from the github repository, you can run the following command: - -```bash -pip install git+https://github.com/tile-ai/tilelang.git -``` - -After installing TileLang, you can verify the installation by running: - -```bash -python -c "import tilelang; print(tilelang.__version__)" -``` - -## Building from Source - -**Prerequisites for building from source:** -- **Operating System**: Linux -- **Python Version**: >= 3.7 -- **CUDA Version**: >= 10.0 - -We recommend using a docker container with the necessary dependencies to build TileLang from source. You can use the following command to run a docker container with the necessary dependencies: - -```bash -docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.01-py3 -``` - -To build and install TileLang directly from source, follow the steps below. This process requires certain pre-requisites from apache tvm, which can be installed on Ubuntu/Debian-based systems using the following commands: - -```bash -sudo apt-get update -sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev -``` - -After installing the prerequisites, you can clone the TileLang repository and install it using pip: - -```bash -git clone --recursive https://github.com/tile-ai/tilelang.git -cd TileLang -pip install . # Please be patient, this may take some time. -``` - -if you want to install TileLang with the development mode, you can run the following command: - -```bash -pip install -e . -``` - -We currently provide three ways to install **tile-lang**: - - [Install from Source (using your own TVM installation)](#install-from-source-with-your-own-tvm-installation) - - [Install from Source (using the bundled TVM submodule)](#install-from-source-with-our-tvm-submodule) - - [Install Using the Provided Script](#install-with-provided-script) - - -### Method 1: Install from Source (using your own TVM installation) - -If you already have a compatible TVM installation, follow these steps: - -1. **Clone the Repository:** - - ```bash - git clone --recursive https://github.com/tile-ai/tilelang - cd TileLang - ``` - - > **Note**: Use the `--recursive` flag to include necessary submodules. - -2. **Configure Build Options:** - - Create a build directory and specify your existing TVM path: - - ```bash - mkdir build - cd build - cmake .. -DTVM_PREBUILD_PATH=/your/path/to/tvm/build # e.g., /workspace/tvm/build - make -j 16 - ``` - -3. **Set Environment Variables:** - - Update `PYTHONPATH` to include the `tile-lang` Python module: - - ```bash - export PYTHONPATH=/your/path/to/tile-lang/python:$PYTHONPATH - # TVM_IMPORT_PYTHON_PATH is used by 3rdparty framework to import tvm - export TVM_IMPORT_PYTHON_PATH=/your/path/to/tvm/python - ``` - -### Method 2: Install from Source (using the bundled TVM submodule) - -If you prefer to use the built-in TVM version, follow these instructions: - -1. **Clone the Repository:** - - ```bash - git clone --recursive https://github.com/tile-ai/tilelang - cd TileLang - ``` - - > **Note**: Ensure the `--recursive` flag is included to fetch submodules. - -2. **Configure Build Options:** - - Copy the configuration file and enable the desired backends (e.g., LLVM and CUDA): - - ```bash - mkdir build - cp 3rdparty/tvm/cmake/config.cmake build - cd build - echo "set(USE_LLVM ON)" >> config.cmake - echo "set(USE_CUDA ON)" >> config.cmake - # or echo "set(USE_ROCM ON)" >> config.cmake if want to enable rocm runtime - cmake .. - make -j 16 - ``` - - The build outputs (e.g., `libtilelang.so`, `libtvm.so`, `libtvm_runtime.so`) will be generated in the `build` directory. - -3. **Set Environment Variables:** - - Ensure the `tile-lang` Python package is in your `PYTHONPATH`: - - ```bash - export PYTHONPATH=/your/path/to/TileLang/python:$PYTHONPATH - ``` - -### Method 3: Install Using the Provided Script - -For a simplified installation, use the provided script: - -1. **Clone the Repository:** - - ```bash - git clone --recursive https://github.com/tile-ai/tilelang - cd TileLang - ``` - -2. **Run the Installation Script:** - - ```bash - bash install.sh - # or bash `install_amd.sh` if you want to enable rocm runtime - ``` - -This script automates the setup, including submodule initialization and configuration. diff --git a/deprecated/docs/flash_perf.md b/deprecated/docs/flash_perf.md deleted file mode 100644 index 85e65d646..000000000 --- a/deprecated/docs/flash_perf.md +++ /dev/null @@ -1,25 +0,0 @@ -The flash-attention performance on RTX-4090 GPU, with cuda toolkit 12.2 - -SEQ_LEN is fixed to 2k, All matmul use fp16->fp32 mma, value in TFlops, higher is better. - -Flash-Forward -| CASUAL,DIM | Flash_attn | Tvm.tl | -| --------- | ---------- | ------ | -| False, 32 | 159.79 | 156.82 | -| False, 64 | 168.91 | 166.84 | -| False, 128 | 169.28 | 166.51 | -| False, 256 | 156.15 | 166.77 | -| True, 32 | 126.78 | 142.59 | -| True, 64 | 142.23 | 152.43 | -| True, 128 | 151.19 | 156.30 | -| True, 256 | 144.12 | 151.54 | - -Flash-backward -| CASUAL,DIM | Flash_attn | Tvm.tl | -| --------- | ---------- | ------ | -| False, 32 | 115.12 | 120.03 | -| False, 64 | 124.81 | 130.94 | -| False, 128 | 124.57 | 122.99 | -| True, 32 | 86.48 | 95.66 | -| True, 64 | 96.53 | 106.03 | -| True, 128 | 99.23 | 100.24 | diff --git a/deprecated/docs/language_ref.md b/deprecated/docs/language_ref.md deleted file mode 100644 index 2527c8973..000000000 --- a/deprecated/docs/language_ref.md +++ /dev/null @@ -1,61 +0,0 @@ -# TVM.TL language reference - -## T.Kernel -args: the grid size (0-3 dimension) and the num_threads. - -returns: the blockIdx variables - -launch a kernel, it must be used in a with statement. There can be multiple kernels launched sequentially inside a prim function. - -## T.alloc_shared -args: shape, dtype - -returns: Buffer - -Allocate buffer on shared memory, It must be used within T.Kernel scope and should be allocated at the top of the scope. - -Dynamic shared memory is used. - -## T.alloc_fragment -args: shape, dtype - -returns: Buffer - -Allocate buffer on register memory, It must be used within T.Kernel scope and should be allocated at the top of the scope. - -The shape represents the whole shape of the buffer. Each element in the buffer is distributed stored on each threads, this storage partition will be inferred by the compiler. - -## T.copy -args: src, dst - -Copies data from src to dst, src and dst can be one of (Buffer, BufferLoad, BufferRegion). If you use BufferLoad that represents a single starting point, the other params should not be BufferLoad, since we need to know the copy region. - -Zero will be padded if we detect the load is out of boundary. - -## T.gemm -args: A, B, C, transpose_A, transpose_B, policy - -Performs gemm operation on A, B and C. C must be a fragment, B must be on shared memory, A can be either a fragment or shared. - -Note that the current implementation has some shape and dtype constraints, for example, the length of reduction axis must be a multiple of 32 for fp16 multiplicand case, we will update this later. - -## T.reduce_max T.reduce_sum -args: src, dst, dim - -Performs a reduce operation from src to dst on dimension dim. Currently we only support src and dst to be a fragment. - -## T.Parallel -You can use T.Parallel to write a loop. The loop will be partitioned to all the threads by the compiler (The compiler will consider vectorize size, the fragment's thread mapping ... ). Note that this is the only way you can perform arbitrary operation on fragments. - -## T.Pipelined -args: start, stop, num_stages - -Pipeline the loop, copy from the global memory will be converted to async operations and reordered to the point after it is consumed. num_stages is the number of buffer between producer-consumer. (e.g. Double buffer when num_stages=2) - -## T.clear T.fill -nothing special, they will be converted to T.Parallel - -## T.use_swizzle -Optimization for L2 cache. The launch of blockIdx.x and blockIdx.y will be serpentined. - -You need to add it in a kernel after buffer is all allocated. diff --git a/docs/get_started/Installation.rst b/docs/get_started/Installation.rst index f59baa300..f5af5e017 100644 --- a/docs/get_started/Installation.rst +++ b/docs/get_started/Installation.rst @@ -65,7 +65,7 @@ After installing the prerequisites, you can clone the TileLang repository and in .. code:: bash git clone --recursive https://github.com/tile-ai/tilelang.git - cd TileLang + cd tileLang pip install . # Please be patient, this may take some time. If you want to install TileLang in development mode, you can run the following command: @@ -76,11 +76,13 @@ If you want to install TileLang in development mode, you can run the following c We currently provide three methods to install **TileLang**: -1. Install from Source (using your own TVM installation) +1. `Install from Source (using your own TVM installation)`_ +2. `Install from Source (using the bundled TVM submodule)`_ +3. `Install Using the Provided` Script_ -2. Install from Source (using the bundled TVM submodule) - -3. Install Using the Provided Script +.. _Install from Source (using your own TVM installation): #method-1-install-from-source-using-your-own-tvm-installation +.. _Install from Source (using the bundled TVM submodule): #method-2-install-from-source-using-the-bundled-tvm-submodule +.. _Install Using the Provided Script: #method-3-install-using-the-provided-script Method 1: Install from Source (Using Your Own TVM Installation) @@ -93,7 +95,7 @@ If you already have a compatible TVM installation, follow these steps: .. code:: bash git clone --recursive https://github.com/tile-ai/tilelang - cd TileLang + cd tilelang **Note**: Use the `--recursive` flag to include necessary submodules. @@ -114,7 +116,7 @@ If you already have a compatible TVM installation, follow these steps: .. code:: bash - export PYTHONPATH=/your/path/to/tile-lang/python:$PYTHONPATH + export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH # TVM_IMPORT_PYTHON_PATH is used by 3rd-party frameworks to import TVM export TVM_IMPORT_PYTHON_PATH=/your/path/to/tvm/python @@ -128,7 +130,7 @@ If you prefer to use the built-in TVM version, follow these instructions: .. code:: bash git clone --recursive https://github.com/tile-ai/tilelang - cd TileLang + cd tilelang **Note**: Ensure the `--recursive` flag is included to fetch submodules. @@ -155,7 +157,7 @@ If you prefer to use the built-in TVM version, follow these instructions: .. code:: bash - export PYTHONPATH=/your/path/to/TileLang/python:$PYTHONPATH + export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH Method 3: Install Using the Provided Script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -167,11 +169,11 @@ For a simplified installation, use the provided script: .. code:: bash git clone --recursive https://github.com/tile-ai/tilelang - cd TileLang + cd tilelang 2. **Run the Installation Script**: .. code:: bash - bash install.sh + bash install_cuda.sh # or bash `install_amd.sh` if you want to enable ROCm runtime