forked from OpenMathLib/OpenBLAS
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request OpenMathLib#4562 from honno/mkdocs-wiki
Fold wiki contents into formal documentation, build-able with `mkdocs`
- Loading branch information
Showing
13 changed files
with
2,106 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
name: Publish docs via GitHub Pages | ||
on: | ||
push: | ||
branches: | ||
- develop | ||
jobs: | ||
build: | ||
name: Deploy docs | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions/setup-python@v2 | ||
with: | ||
python-version: "3.10" | ||
- run: pip install mkdocs mkdocs-material | ||
# mkdocs gh-deploy command only builds to the top-level, hence building then deploying ourselves | ||
- run: mkdocs build | ||
- name: Deploy docs | ||
uses: peaceiris/actions-gh-pages@v3 | ||
if: ${{ github.ref == 'refs/heads/develop' }} | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
publish_dir: ./site | ||
destination_dir: docs/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
## Mailing list | ||
|
||
We have a [GitHub discussions](https://github.com/OpenMathLib/OpenBLAS/discussions/) forum to discuss usage and development of OpenBLAS. We also have a [Google group for *users*](https://groups.google.com/forum/#!forum/openblas-users) and a [Google group for *development of*](https://groups.google.com/forum/#!forum/openblas-dev) OpenBLAS. | ||
|
||
## Donations | ||
|
||
You can read OpenBLAS statement of receipts and disbursement and cash balance on [google doc](https://docs.google.com/spreadsheet/ccc?key=0AghkTjXe2lDndE1UZml0dGpaUzJmZGhvenBZd1F2R1E&usp=sharing). A backer list is available [on GitHub](https://github.com/OpenMathLib/OpenBLAS/blob/develop/BACKERS.md). | ||
|
||
We welcome the hardware donation, including the latest CPU and boards. | ||
|
||
## Acknowledgements | ||
|
||
This work is partially supported by | ||
* Research and Development of Compiler System and Toolchain for Domestic CPU, National S&T Major Projects: Core Electronic Devices, High-end General Chips and Fundamental Software (No.2009ZX01036-001-002) | ||
* National High-tech R&D Program of China (Grant No.2012AA010903) | ||
|
||
## Users of OpenBLAS | ||
|
||
* <a href='http://julialang.org/'>Julia - a high-level, high-performance dynamic programming language for technical computing</a><br /> | ||
* Ceemple v1.0.3 (C++ technical computing environment), including OpenBLAS, Qt, Boost, OpenCV and others. The only solution with immediate-recompilation of C++ code. Available from <a href='http://www.ceemple.com'>Ceemple C++ Technical Computing</a>. | ||
* [netlib-java](https://github.com/fommil/netlib-java) and various upstream libraries, allowing OpenBLAS to be used from languages on the Java Virtual Machine. | ||
|
||
<!-- TODO: academia users, industry users, hpc centers deployed openblas, etc. --> | ||
|
||
## Publications | ||
|
||
### 2013 | ||
|
||
* Wang Qian, Zhang Xianyi, Zhang Yunquan, Qing Yi, **AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs**, In the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver CO, November 2013. [[pdf](http://xianyi.github.io/paper/augem_SC13.pdf)] | ||
|
||
### 2012 | ||
|
||
* Zhang Xianyi, Wang Qian, Zhang Yunquan, **Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor**, 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), 17-19 Dec. 2012. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
!!! warning | ||
This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself. | ||
|
||
## Makefile dep graph | ||
|
||
``` | ||
Makefile | ||
| | ||
|----- Makefile.system # !!! this is included by many of the Makefiles in the subdirectories !!! | ||
| | | ||
| |===== Makefile.prebuild # This is triggered (not included) once by Makefile.system | ||
| | | # and runs before any of the actual library code is built. | ||
| | | # (builds and runs the "getarch" tool for cpu identification, | ||
| | | # runs the compiler detection scripts c_check and f_check) | ||
| | | | ||
| | ----- (Makefile.conf) [ either this or Makefile_kernel.conf is generated ] | ||
| | | { Makefile.system#L243 } | ||
| | ----- (Makefile_kernel.conf) [ temporary Makefile.conf during DYNAMIC_ARCH builds ] | ||
| | | ||
| |----- Makefile.rule # defaults for build options that can be given on the make command line | ||
| | | ||
| |----- Makefile.$(ARCH) # architecture-specific compiler options and OpenBLAS buffer size values | ||
| | ||
|~~~~~ exports/ | ||
| | ||
|~~~~~ test/ | ||
| | ||
|~~~~~ utest/ | ||
| | ||
|~~~~~ ctest/ | ||
| | ||
|~~~~~ cpp_thread_test/ | ||
| | ||
|~~~~~ kernel/ | ||
| | ||
|~~~~~ ${SUBDIRS} | ||
| | ||
|~~~~~ ${BLASDIRS} | ||
| | ||
|~~~~~ ${NETLIB_LAPACK_DIR}{,/timing,/testing/{EIG,LIN}} | ||
| | ||
|~~~~~ relapack/ | ||
``` | ||
|
||
## Important Variables | ||
|
||
Most of the tunable variables are found in [Makefile.rule](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.rule), along with their detailed descriptions.<br/> | ||
Most of the variables are detected automatically in [Makefile.prebuild](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.prebuild), if they are not set in the environment. | ||
|
||
### CPU related | ||
``` | ||
ARCH - Target architecture (eg. x86_64) | ||
TARGET - Target CPU architecture, in case of DYNAMIC_ARCH=1 means library will not be usable on less capable CPUs | ||
TARGET_CORE - TARGET_CORE will override TARGET internally during each cpu-specific cycle of the build for DYNAMIC_ARCH | ||
DYNAMIC_ARCH - For building library for multiple TARGETs (does not lose any optimizations, but increases library size) | ||
DYNAMIC_LIST - optional user-provided subset of the DYNAMIC_CORE list in Makefile.system | ||
``` | ||
|
||
### Toolchain related | ||
``` | ||
CC - TARGET C compiler used for compilation (can be cross-toolchains) | ||
FC - TARGET Fortran compiler used for compilation (can be cross-toolchains, set NOFORTRAN=1 if used cross-toolchain has no fortran compiler) | ||
AR, AS, LD, RANLIB - TARGET toolchain helpers used for compilation (can be cross-toolchains) | ||
HOSTCC - compiler of build machine, needed to create proper config files for target architecture | ||
HOST_CFLAGS - flags for build machine compiler | ||
``` | ||
|
||
### Library related | ||
``` | ||
BINARY - 32/64 bit library | ||
BUILD_SHARED - Create shared library | ||
BUILD_STATIC - Create static library | ||
QUAD_PRECISION - enable support for IEEE quad precision [ largely unimplemented leftover from GotoBLAS, do not use ] | ||
EXPRECISION - Obsolete option to use float80 of SSE on BSD-like systems | ||
INTERFACE64 - Build with 64bit integer representations to support large array index values [ incompatible with standard API ] | ||
BUILD_SINGLE - build the single-precision real functions of BLAS [and optionally LAPACK] | ||
BUILD_DOUBLE - build the double-precision real functions | ||
BUILD_COMPLEX - build the single-precision complex functions | ||
BUILD_COMPLEX16 - build the double-precision complex functions | ||
(all four types are included in the build by default when none was specifically selected) | ||
BUILD_BFLOAT16 - build the "half precision brainfloat" real functions | ||
USE_THREAD - Use a multithreading backend (default to pthread) | ||
USE_LOCKING - implement locking for thread safety even when USE_THREAD is not set (so that the singlethreaded library can | ||
safely be called from multithreaded programs) | ||
USE_OPENMP - Use OpenMP as multithreading backend | ||
NUM_THREADS - define this to the maximum number of parallel threads you expect to need (defaults to the number of cores in the build cpu) | ||
NUM_PARALLEL - define this to the number of OpenMP instances that your code may use for parallel calls into OpenBLAS (default 1,see below) | ||
``` | ||
|
||
|
||
OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads. | ||
For efficiency, the management array structure for these buffers is sized at build time - this makes it necessary to know in advance how | ||
many threads need to be supported on the target system(s). | ||
With OpenMP, there is an additional level of complexity as there may be calls originating from a parallel region in the calling program. If OpenBLAS gets called from a single parallel region, it runs single-threaded automatically to avoid overloading the system by fanning out its own set of threads. | ||
In the case that an OpenMP program makes multiple calls from independent regions or instances in parallel, this default serialization is not | ||
sufficient as the additional caller(s) would compete for the original set of buffers already in use by the first call. | ||
So if multiple OpenMP runtimes call into OpenBLAS at the same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# CI jobs | ||
|
||
| Arch|Target CPU|OS|Build system|XComp to|C Compiler|Fortran Compiler|threading|DYN_ARCH|INT64|Libraries| CI Provider| CPU count| | ||
| ------------|---|---|-----------|-------------|----------|----------------|------|------------|----------|-----------|----------|-------| | ||
| x86_64 |Intel 32bit|Windows|CMAKE/VS2015| -|mingw6.3| - | pthreads | - | - | static | Appveyor| | | ||
| x86_64 |Intel |Windows|CMAKE/VS2015| -|mingw5.3| - | pthreads | - | - | static | Appveyor| | | ||
| x86_64 |Intel |Centos5|gmake | -|gcc 4.8 |gfortran| pthreads | + | - | both | Azure | | | ||
| x86_64 |SDE (SkylakeX)|Ubuntu| CMAKE| - | gcc | gfortran | pthreads | - | - | both | Azure | | | ||
| x86_64 |Haswell/ SkylakeX|Windows|CMAKE/VS2017| - | VS2017| - | | - | - | static | Azure | | | ||
| x86_64 | " | Windows|mingw32-make| - |gcc | gfortran | | list | - | both | Azure | | | ||
| x86_64 | " |Windows|CMAKE/Ninja| - |LLVM | - | | - | - | static | Azure | | | ||
| x86_64 | " |Windows|CMAKE/Ninja| - |LLVM | flang | | - | - | static | Azure | | | ||
| x86_64 | " |Windows|CMAKE/Ninja| - |VS2022| flang* | | - | - | static | Azure | | | ||
| x86_64 | " |macOS11|gmake | - | gcc-10|gfortran| OpenMP | + | - | both | Azure | | | ||
| x86_64 | " |macOS11|gmake | - | gcc-10|gfortran| none | - | - | both | Azure | | | ||
| x86_64 | " |macOS12|gmake | - | gcc-12|gfortran|pthreads| - | - | both | Azure | | | ||
| x86_64 | " |macOS11|gmake | - | llvm | - | OpenMP | + | - | both | Azure | | | ||
| x86_64 | " |macOS11|CMAKE | - | llvm | - | OpenMP | no_avx512 | - | static | Azure | | | ||
| x86_64 | " |macOS11|CMAKE | - | gcc-10| gfortran| pthreads | list | - | shared | Azure | | | ||
| x86_64 | " |macOS11|gmake | - | llvm | ifort | pthreads | - | - | both | Azure | | | ||
| x86_64 | " |macOS11|gmake |arm| AndroidNDK-llvm | - | | - | - | both | Azure | | | ||
| x86_64 | " |macOS11|gmake |arm64| XCode 12.4 | - | | + | - | both | Azure | | | ||
| x86_64 | " |macOS11|gmake |arm | XCode 12.4 | - | | + | - | both | Azure | | | ||
| x86_64 | " |Alpine Linux(musl)|gmake| - | gcc | gfortran | pthreads | + | - | both | Azure | | | ||
| arm64 |Apple M1 |OSX |CMAKE/XCode| - | LLVM | - | OpenMP | - | - | static | Cirrus | | | ||
| arm64 |Apple M1 |OSX |CMAKE/Xcode| - | LLVM | - | OpenMP | - | + | static | Cirrus | | | ||
| arm64 |Apple M1 |OSX |CMAKE/XCode|x86_64| LLVM| - | - | + | - | static | Cirrus | | | ||
| arm64 |Neoverse N1|Linux |gmake | - |gcc10.2| -| pthreads| - | - | both | Cirrus | | | ||
| arm64 |Neoverse N1|Linux |gmake | - |gcc10.2| -| pthreads| - | + | both | Cirrus | | | ||
| arm64 |Neoverse N1|Linux |gmake |- |gcc10.2| -| OpenMP | - | - | both |Cirrus | 8 | | ||
| x86_64 | Ryzen| FreeBSD |gmake | - | gcc12.2|gfortran| pthreads| - | - | both | Cirrus | | | ||
| x86_64 | Ryzen| FreeBSD |gmake | | gcc12.2|gfortran| pthreads| - | + | both | Cirrus | | | ||
| x86_64 |GENERIC |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | | | ||
| x86_64 |SICORTEX |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | | | ||
| x86_64 |I6400 |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | | | ||
| x86_64 |P6600 |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | | | ||
| x86_64 |I6500 |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | | | ||
| x86_64 |Intel |Ubuntu |CMAKE| - | gcc-11.3 | gfortran | pthreads | + | - | static | Github | | | ||
| x86_64 |Intel |Ubuntu |gmake| - | gcc-11.3 | gfortran | pthreads | + | - | both | Github | | | ||
| x86_64 |Intel |Ubuntu |CMAKE| - | gcc-11.3 | flang-classic | pthreads | + | - | static | Github | | | ||
| x86_64 |Intel |Ubuntu |gmake| - | gcc-11.3 | flang-classic | pthreads | + | - | both | Github | | | ||
| x86_64 |Intel |macOS12 | CMAKE| - | AppleClang 14 | gfortran | pthreads | + | - | static | Github | | | ||
| x86_64 |Intel |macOS12 | gmake| - | AppleClang 14 | gfortran | pthreads | + | - | both | Github | | | ||
| x86_64 |Intel |Windows2022 | CMAKE/Ninja| - | mingw gcc 13 | gfortran | | + | - | static | Github | | | ||
| x86_64 |Intel |Windows2022 | CMAKE/Ninja| - | mingw gcc 13 | gfortran | | + | + | static | Github | | | ||
| x86_64 |Intel 32bit|Windows2022 | CMAKE/Ninja| - | mingw gcc 13 | gfortran | | + | - | static | Github | | | ||
| x86_64 |Intel |Windows2022 | CMAKE/Ninja| - | LLVM 16 | - | | + | - | static | Github | | | ||
| x86_64 |Intel | Windows2022 |CMAKE/Ninja| - | LLVM 16 | - | | + | + | static | Github | | | ||
| x86_64 |Intel | Windows2022 |CMAKE/Ninja| - | gcc 13| - | | + | - | static | Github | | | ||
| x86_64 |Intel| Ubuntu |gmake |mips64|gcc|gfortran|pthreads|+|-|both|Github| | | ||
| x86_64 |generic|Ubuntu |gmake |riscv64|gcc|gfortran|pthreads|-|-|both|Github| | | ||
| x86_64 |Intel|Ubuntu |gmake |mips32|gcc|gfortran|pthreads|-|-|both|Github | | | ||
| x86_64 |Intel|Ubuntu |gmake |ia64|gcc|gfortran|pthreads|-|-|both|Github| | | ||
| x86_64 |C910V|QEmu |gmake |riscv64|gcc|gfortran|pthreads|-|-|both|Github| | | ||
|power |pwr9| Ubuntu |gmake | - |gcc|gfortran|OpenMP|-|-|both|OSUOSL| | | ||
|zarch |z14 | Ubuntu |gmake | - |gcc|gfortran|OpenMP|-|-|both|OSUOSL| | |
Oops, something went wrong.