Skip to content

Commit

Permalink
[T150624233] Work around GCC 12 regressions
Browse files Browse the repository at this point in the history
- Initialize `__m256i` variable to silence warnings

- Update FBGEMM README with proper build instructions, along with the
workaround for GCC 12 regression (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593)
  • Loading branch information
q10 committed Apr 12, 2023
1 parent 1974152 commit 94668ed
Show file tree
Hide file tree
Showing 2 changed files with 80 additions and 47 deletions.
125 changes: 79 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,40 +13,88 @@ fusion opportunities in order to overcome the unique challenges of matrix
multiplication at lower precision with bandwidth-bound operations.

FBGEMM is used as a backend of Caffe2 and PyTorch quantized operators for x86 machines:
* Caffe2: https://github.com/pytorch/pytorch/tree/master/caffe2/quantization/server
* PyTorch: https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/native/quantized/cpu

## What's New?
* [New Features and Recent Improvements](https://github.com/pytorch/FBGEMM/wiki/Recent-feature-additions-and-improvements-in-FBGEMM) (January, 2020)
* Caffe2: https://github.com/pytorch/pytorch/tree/master/caffe2/quantization/server
* PyTorch: https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/native/quantized/cpu



## Build Instructions

### Build with CMake

The general instructions for building with Cmake are as follows:

```sh
# Clone the repo
git clone --recursive https://github.com/pytorch/FBGEMM.git
cd FBGEMM

# Pull down the submodules
git submodule sync
git submodule update --init --recursive

# Create a build directory
mkdir build
cd build

# Set up the build
cmake -DUSE_SANITIZER=address -DFBGEMM_LIBRARY_TYPE=shared -DPYTHON_EXECUTABLE=/usr/bin/python3 ..

# Run the build
make -C build -j VERBOSE=1

# Run all tests
make test

# Install the package
make install
```

##### Build Issues with GCC 12

As of time of writing, compilation of FBGEMM on GCC 12 will fail due to a
[known compiler regression](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593).
To work around the issue, simply add the following exports prior to running CMake:

```sh
export CFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"
export CXXFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"
```

## Examples
Please see GitHub issues [77939](https://github.com/pytorch/pytorch/issues/77939),
[1094](https://github.com/pytorch/FBGEMM/issues/1094), and
[1666](https://github.com/pytorch/FBGEMM/issues/1666) for more details.

The tests (in test folder) and benchmarks (in bench folder) are some great
examples of using FBGEMM. For instance, SpMDMTest test in
test/PackedRequantizeAcc16Test.cc shows how to combine row offset calculations
with packing of A (PackAWithRowOffset), how to pack B matrix (PackBMatrix) and
construct output pipeline (sparse\_matrix\*dense\_matrix --> requantization -->
nop) fused with inner GEMM macro kernel.
### Run Examples

## Build Notes
FBGEMM uses the standard CMAKE-based build flow.
The tests in the `test/` directory and benchmarks in the `bench/` directory are
some great examples of using FBGEMM. For instance, the `SpMDMTest` test in
`test/PackedRequantizeAcc16Test.cc` shows how to combine row offset calculations
with packing of A (`PackAWithRowOffset`), how to pack B matrix (`PackBMatrix`)
and construct output pipeline `(sparse_matrix*dense_matrix --> requantization -->
nop)` fused with inner GEMM macro kernel.

### Dependencies
FBGEMM requires gcc 5+ and a CPU with support for avx2 instruction set or
higher. It's been tested on Mac OS X and Linux.

+ ###### asmjit
FBGEMM requires gcc 5+ and a CPU with support for AVX2 instruction set or
higher. It has been tested on Mac OS X and Linux.

#### asmjit

With inner kernels, FBGEMM takes a “one size doesn't fit all” approach, so the
implementation dynamically generates efficient matrix-shape specific vectorized
code using a third-party library called [asmjit][1]. **asmjit is required** to
build FBGEMM.

+ ###### cpuinfo
#### cpuinfo

FBGEMM detects CPU instruction set support at runtime using cpuinfo library and
dispatches optimized kernels for the detected instruction set. Therefore,
**cpuinfo is required** to detect CPU type.

+ ###### googletest
#### googletest

googletest is required to build and run FBGEMM's tests. **googletest is not
required** if you don't want to run FBGEMM tests. By default, building of tests
is **on**. Turn it off by setting FBGEMM\_BUILD\_TESTS to off.
Expand All @@ -62,45 +110,28 @@ MKL path is provided with INTEL\_MKL\_DIR benchmarks are built with MKL and
performance numbers are reported for MKL functions as well. However, if MKL is
not found, the benchmarks are not built.

General build instructions are as follows:

```
git clone --recursive https://github.com/pytorch/FBGEMM.git
cd FBGEMM
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive
cmake -B build
make -C build
```
## Documentation

To run the tests after building FBGEMM (if tests are built), use the following
command:
```
make test
```
For a high-level overview, design philosophy and brief descriptions of various
parts of FBGEMM please see [our blog post][4].

## Installing FBGEMM
```
make install
```
### What's New?

## How FBGEMM works
For a high-level overview, design philosophy and brief descriptions of various
parts of FBGEMM please see [our blog][4].
* [New Features and Recent Improvements](https://github.com/pytorch/FBGEMM/wiki/Recent-feature-additions-and-improvements-in-FBGEMM) (January, 2020)

### API Docs

## Full documentation
We have extensively used comments in our source files. The best and up-do-date
documentation is available in the source files.

You can also turn on the option to generate the documentation (using [Doxygen][5]
and [Sphinx][6] by setting FBGEMM\_BUILD\_DOCS to ON, and then follow the above
cmake build process.
and [Sphinx][6] by setting the `-DFBGEMM_BUILD_DOCS=ON` flag when invoking CMake.

### Citation

## Citation
For those looking for the appropriate article to cite regarding FBGEMM, we
recommend citing our
[paper](https://arxiv.org/pdf/2101.05615.pdf):
recommend citing our [paper](https://arxiv.org/pdf/2101.05615.pdf):

```
@article{fbgemm,
Expand All @@ -112,6 +143,7 @@ recommend citing our
```

## Join the FBGEMM community

For questions or feature requests, please file a ticket over on
[GitHub Issues](https://github.com/pytorch/FBGEMM/issues) or reach out to us on
the `#fbgemm` channel in [PyTorch Slack](https://bit.ly/ptslack).
Expand All @@ -120,6 +152,7 @@ For contributions, please see the [`CONTRIBUTING`](../CONTRIBUTING.md) file for
ways to help out.

## License

FBGEMM is BSD licensed, as found in the [`LICENSE`](LICENSE) file.


Expand Down
2 changes: 1 addition & 1 deletion src/FbgemmSparseDenseInt8Avx2.cc
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ void SparseDenseInt8MMAvx2(
C_i32 + i * ldc + j + idx1 * 8));
}
int rem_int32 = rem - idx1 * VLEN_INT32;
__m256i mask_int32_v;
__m256i mask_int32_v = _mm256_setzero_si256();
if (rem_int32 > 0) {
mask_int32_v = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(
&avx2_ps_or_epi32_combined_mask[VLEN_INT32 - rem_int32]));
Expand Down

0 comments on commit 94668ed

Please sign in to comment.