Skip to content

Commit

Permalink
[Initilization] refactor docs (apache#9)
Browse files Browse the repository at this point in the history
  • Loading branch information
LeiWang1999 authored Mar 11, 2024
1 parent 7caf0de commit 6bafef3
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 14 deletions.
7 changes: 0 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,7 @@ That would be awesome if you want to contribute something to BitBLAS!
- [Asking Questions](contributing.md#asking-questions)
- [Submitting Pull Requests](contributing.md#submitting-pull-requests)
- [Repository Setup](contributing.md#repository-setup)
- [Running Examples](contributing.md#running-examples)
- [Running Tests](contributing.md#running-tests)
- [Testing Input Methods](contributing.md#testing-input-methods)
- [Publishing Releases](contributing.md#publishing-releases)
- [Publishing Normal `@latest` Release](contributing.md#publishing-normal-latest-release)
- [Publishing `@next` Release](contributing.md#publishing-next-release)
- [Publishing `@experimental` Release](contributing.md#publishing-experimental-release)
- [Running Prerelease Script](contributing.md#running-prerelease-script)

## Reporting Bugs

Expand Down
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,11 @@ Some of the key features of BitBLAS include:
- High Performance (Not only FP16xFP16, INT8xINT8, but also FP16xINT4/2/1, INT8xINT4/2/1).
- With the flexible DSL (TIR Script) to effortlessly craft domain-specific kernels for your situations.
- Support with dynamic symbolic throuth tvm unity -> generate source code with dynamic shape.

Latest News 🔥

- 2023-03-03: BitBLAS first proposed int8xint1 gemv/gemm with 10x/2x speedup over float16xfloat16 on A100, please checkout [op_benchmark_a100_int1_scaling](images/figures/op_benchmark_a100_int1_scaling.png) for detailed input scaling benchmark results.
- BitBLAS first proposed int8xint1 gemv/gemm with 10x/2x speedup over float16xfloat16 on A100, please checkout [op_benchmark_a100_int1_scaling](images/figures/op_benchmark_a100_int1_scaling.png) for detailed input scaling benchmark results.


## Benchmark
BitBLAS can achieve optimal performance across various compute pattern:
BitBLAS can achieve optimal performance across various compute patterns:

- GTX 3090
- FLOAT16xFLOAT16 with TensorCore ![3090-gemm-fp16](./images/figures/op_benchmark_3090_fp16_gemm.png)
Expand Down Expand Up @@ -52,5 +49,6 @@ This project may contain trademarks or logos for projects, products, or services
## Acknowledgement

We learned a lot from the following projects.
- [Apache TVM](https://github.com/apache/tvm): We use TensorIR as our DSL currently, and we customized tvm from unity branch to support some features we needed.
- [Microsoft Roller](https://github.com/microsoft/nnfusion/tree/roller): The design and algo inspiration of hardware aware tuning comes from Roller.

- [Apache TVM](https://github.com/apache/tvm): BitBLAS havs adopted TensorIR as our DSL. Additionally, we have customized TVM from the unity branch to incorporate specific features that were required for our project.
- [Microsoft Roller](https://github.com/microsoft/nnfusion/tree/roller): The design and algo inspiration of hardware aware tuning in BitBLAS comes from Roller,.

0 comments on commit 6bafef3

Please sign in to comment.