Releases: intel/qpl
Intel QPL v1.7.0
Functionality
- Enhanced the Benchmarks Framework to incorporate the new QPL device selection mechanism introduced in the previous release.
- Saved intermediate job states in the dynamic Deflate job to prevent duplicate work when executing with the asynchronous API on the
Hardware Path
and encountering theQPL_STS_QUEUES_ARE_BUSY_ERR
error. In such cases, the job is resubmitted without repeating the already completed work. In v1.6.0 release, this functionality was enabled with the synchronous API. - [experimental feature] Added a mechanism to measure Intel IAA execution time in a single-threaded application with the synchronous API.
Usability and Documentation
- Introduced a clang-format configuration file and formatted the entire codebase using clang-format 17.
- Improved C++ compatibility by fixing field order mismatches when creating structures, initializing
char*
strings with literals, and removing unnecessary conversions between integers and enums. - Added a documentation note clarifying that QPL testing with datasets provided under
tools/testdata
requires a maximum transfer size of 2GB to avoid theQPL_STS_TRANSFER_SIZE_INVALID
error code. - Updated documentation on the
-DEFFICIENT_WAIT
build option. - Enhanced the Introduction section of the QPL documentation, including adding useful links for the Intel® In-Memory Analytics Accelerator.
- Extended testing to generate stored block insertion on the last job.
- Made multiple updates to documentation and examples on
qpl_get_safe_deflate_compression_buffer
size usage for multi-chunk compression. - Improved distance code computation logic on the
Software Path
.
Deprecated Functionality
- The Force Array Output Modification Feature has been deprecated on the Auto Path due to the lack of host fallback support. Use the
Hardware Path
instead.
Bug Fixes
- Resolved build issues with Clang-17 caused by a missing header.
- Corrected logic in
qpl_check_job
to prevent unintended host fallback instead of accelerator execution. - Fixed the compression verification step on the asynchronous path when a stored block occurs.
- Implemented multiple fixes for the stored block insertion feature on both asynchronous and synchronous paths.
- Prevented reprocessing when
qpl_check_job
orqpl_wait_job
is called after submission. - Implemented multiple fixes for issues with index compression/decompression.
- Fixed intermediate buffer incrementing for the
select
operation. - Initialized intermediate huffman table structure correctly to avoid garbage in the huffman table.
- Implemented creation of the mapping CAM huffman decompression table.
- Resolved the issue of never setting the accelerator context on the
Auto Path
. - Introduced immediate fallback to host execution for the specific case of Huffman-only BE16 decompression on the
Auto Path
.
Known Limitations
-
Intel(R) QPL could be built from directly downloadable files (
.tar
,.tgz
) without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation. -
Known test failures are listed below. Some tests fail only under certain conditions, which are noted in parentheses.
- Functional tests:
- (
software_path
,auto_path
only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify - (
software_path
,auto_path
) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify
- (
- Functional tests:
-
Compression verification on the
qpl_path_software
works only with indexing mode and data of size smaller than 32KB in other modes. -
Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when a header is too big to fit in the input buffer. -
The implementation of
QPL_FLAG_CRC32C
is in progress. -
When using
qpl_path_hardware
, the compression and decompression with indexing mode on IAA 2.0 are limited to data sizes smaller than 4KB. -
The Force Array Output Modification feature is enabled only for
qpl_path_hardware
and Intel IAA 2.0 (and later). In the case ofqpl_path_auto
, an error codeQPL_STS_NOT_SUPPORTED
is returned as no fallback is available currently.
Thanks to the Contributors
The release includes contributions from the project team and @fwph, @Permanence-AI-Coder.
Intel QPL v1.6.0
Functionality
- Introduced a new internal submission mechanism for platforms based on Linux* OS kernel versions where MMAP is no longer permitted. For more details, refer to the Intel Security Advisory. When MMAP is unavailable, the write system call is used instead. This may introduce additional overhead for small data sizes (4KB and smaller) in the Inflate functionality, but no performance implications are expected for larger data sizes or Deflate.
- Updated the QPL device search mechanism to a new default behavior. Now, the platforms with Sub-NUMA clustering configured such that not all NUMA nodes have an accelerator instance can utilize any IAA instance from the same socket for execution unless specified by the user. You still can restrict device selection to a specific NUMA node of the current thread by specifying
QPL_DEVICE_NUMA_ID_CURRENT
, or to a specific NUMA node by settingjob->numa_id = <numa_node_id>
. Additionally, you can extend the entire system by settingQPL_DEVICE_NUMA_ID_ANY
. - Added support for host fallback in the asynchronous API when using the
Auto Path
feature. - Implemented an internal mechanism to save intermediate job states in the dynamic Deflate job. This feature prevents duplicate work when executing with the synchronous API on the
Hardware Path
and encountering theQPL_STS_QUEUES_ARE_BUSY_ERR
error. In such cases, the job is resubmitted without repeating the already completed work.
Usability and Documentation
- Added support for Canned mode in QPL Benchmarks Frameworks.
- Optimized memory usage and reduced startup time for benchmarks when utilizing an exact filter.
- Introduced a new build option
-DQPL_USE_CLANG_TIDY={ON,OFF}
to enable QPL to build with clang-tidy checks. Clang-tidy support is limited to Linux* OS only and requires building QPL with the Clang* compiler. Additionally, introduced a configuration file for clang-tidy and refactored QPL to comply with the introduced clang-tidy configuration file. - Added a new example demonstrating the utilization of dictionary compression with the
Hardware Path
for compression and theSoftware Path
for decompression. - Added new test cases for Select, Scan, and Extract operations to validate the functionality of Force Array Output Modification.
- Expanded the bad argument scenarios for the Force Array Output Modification tests to include additional cases for the
Software Path
. - Added new tests to validate error handling for bad arguments when submitting jobs on the
Hardware Path
andAuto Path
.
Deprecated Functionality
- Deprecated support for canned mode with indexing on the
Software Path
to align with theHardware Path
.
Bug Fixes
- Resolved the issue with compression verification when utilizing IAA 2.0.
- Corrected the test setup for
auto_path
intb_c_api_deflate_with_dictionary.level_none
,tb_c_api_deflate_with_dictionary.hw_multi_chunk
, andtn_c_api_deflate.dynamic/fixed/static}_default_stored_block_overflow
. - Added an execution path check to ensure proper handling of unsupported paths in the Force Array Output Modification.
- Resolved potential undefined behavior by fixing uninitialized pointers in the
canned_one_chuck_hw_vs_sw.cpp
test. - Removed tests related to the unsupported
Software Path
for the canned mode with indexing. - Fixed invalid parquet generation for
tn_c_api_expand.tn_rle_input_error_handling
.
Known Limitations
-
Intel(R) QPL could be built from directly downloadable files (
.tar
,.tgz
) without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation. -
Known test failures are listed below. Some tests only fail under certain conditions, which are noted in parentheses
- Functional tests:
- (
software_path
,auto_path
only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify - (
software_path
,auto_path
) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify - (
hardware_path
,auto_path
on IAA 2.0) ta_c_api_deflate_index_extended.PerformOperation - (
auto_path
) ta_c_api_huffman_only{_verify./.}{dynamic/static}_be - (
auto_path
) ta_c_api_inflate_huffman_only.generated_data - (
auto_path
) ta_c_api_deflate_index.{dynamic/static}_blocks_default_level_verify - (
auto_path
) tb_c_api_expand.source_errors - (
auto_path
) ta_c_api_deflate_inflate_canned_in_loops.default_level
- (
- Functional tests:
-
Compression verification on the
qpl_path_software
works only with indexing mode and data of size smaller than 32KB in other modes. -
Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when a header is too big to fit in the input buffer. -
The implementation of
QPL_FLAG_CRC32C
is in progress. -
When using
qpl_path_hardware
, the compression and decompression with indexing mode on IAA 2.0 are limited to data sizes smaller than 4KB.
Thanks to the Contributors
The release includes contributions from the project team and @aekoroglu, @fwph, and @Permanence-AI-Coder.
Intel QPL v1.5.0
Functionality
- Introduced the new
QPL_FLAG_FORCE_ARRAY_OUTPUT
flag to enable the Force Array Output Modification feature. This feature is supported on Intel® In-Memory Analytics Accelerator 2.0 and allows filter operation outputs to be received as an array with a defined bit width when the output bit width is 1. - Enabled host fallback for synchronous API when
Auto Path
is used. Note thatAuto Path
with asynchronous execution is not yet supported. - Enabled the building of QPL as a shared library using the
-DQPL_LIBRARY_TYPE=SHARED
build flag. - Added a pkg-config support (see
<install_dir>/lib/pkgconfig/qpl.pc
file) for shared library built with dynamic loading oflibaccel-config
.
Usability and Documentation
- Extended examples with a recipe for using
idxd-config
APIs to query accelerator configuration information relevant to QPL usage. - Revised and improved the examples for scan functionality. Refer to
examples/low-level-api/scan_for_specific_value_example.cpp
andexamples/low-level-api/scan_for_elements_in_range_example.cpp
. - Added a new example,
expand_with_force_array_output_mod_example
, to demonstrate the usage of the Force Array Output Modification feature. - Updated documentation to describe the new Force Array Output Modification feature and its interaction with Output Bit Width Modification.
- Updated System Requirements documentation section for using IAA 2.0 (Linux kernel version 6.3 or later is required).
- Extended testing suite to cover Filter operations for the case when Block on Fault is set to OFF.
- Extended functional test to cover dictionary compression and decompression that reuses the job structure.
- Initialization tests were removed as outdated.
Bug Fixes
- Fixed possible symbol conflict when QPL is used in the same application with ISA-L.
- Fixed a possible "_FORTIFY_SOURCE redefined" build warning/error. Some GCC* builds could internally set _FORTIFY_SOURCE, which could have resulted in a QPL build error.
- Fixed an issue in
qpl_gather_deflate_statistics
that resulted in a lower compression ratio onqpl_path_auto
. - Fixed low compression ratio issues on
qpl_path_hardware
when compressing with a user-provided dictionary smaller than IAA's reserved dictionary size. - Fixed an issue where the incorrect error code
QPL_STS_INIT_WORK_QUEUES_NOT_AVAILABLE
was returned when using IAA 2.0 with older Linux kernel versions. - Fixed an issue with Fixed mode dictionary decompression on IAA when the job is re-used. Previously, error code 222 was returned.
- Fixed possible
nullptr
dereference when using Canned mode compression on aqpl_path_hardware
. - Fixed possible data corruption in dictionary decompression on a
qpl_path_hardware
.
Known Limitations
- Intel(R) QPL could be built from directly downloadable files (
.tar
,.tgz
) without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation. - Known test failures are listed below. Some tests only fail under certain conditions, which are noted in parentheses
- Functional tests:
- (
software_path
,auto_path
only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify - (
software_path
,auto_path
) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify - (
hardware_path
,auto_path
) ta_c_api_deflate_canned_indexing.default_level - (
hardware_path
,auto_path
on IAA 2.0) ta_c_api_deflate_index_extended.PerformOperation - (
hardware_path
,auto_path
) tn_c_api_expand.tn_rle_input_error_handling - (
auto_path
) ta_c_api_deflate_canned_indexing.high_level - (
auto_path
) ta_c_api_huffman_only{_verify./.}{dynamic/static}_be - (
auto_path
) ta_c_api_inflate_huffman_only.generated_data - (
auto_path
) ta_c_api_deflate_index.{dynamic/static}_blocks_default_level_verify - (
auto_path
) tb_c_api_expand.source_errors - (
auto_path
) tb_c_api_deflate_with_dictionary.level_none - (
auto_path
) tb_c_api_deflate_with_dictionary.hw_multi_chunk - (
auto_path
) tn_c_api_deflate.{dynamic/fixed/static}_default_stored_block_overflow - (
auto_path
) ta_c_api_deflate_inflate_canned_in_loops.default_level
- (
- Functional tests:
- Compression verification on the
qpl_path_software
works only with indexing mode and data of size smaller than 32KB in other modes. - Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when a header is too big to fit in the input buffer. - The implementation of
QPL_FLAG_CRC32C
is in progress.
Thanks to the Contributors
Release includes contributions from the project team as well as @aepanchi and @miguelinux.
Intel QPL v1.4.0
Functionality
Enabled the canned mode compression with the dictionary for hardware_path
supported on IAA 2.0. Note that canned mode decompression with a dictionary on hardware_path
is not supported, and software_path
can be used as an alternative.
Usability and Documentation
- Added example for canned compression with a dictionary.
- Clarified the documentation about output modification for the expand operation.
- Extended functionality testing for the cases when the IAA Block on Fault feature is set to OFF.
Bug Fixes
- Fixed issue in high-level fixed mode compression on
software_path
. Previously, the job may be completed withQPL_STS_OK
when only partial source data is processed. - Fixed issue in high-level dictionary mode compression. Previously, loss of data would occur during the end of compression.
- Fixed block header decompression for indexing mode on asynchronous path.
- Fixed performance regression that could appear on IAA 2.0 due to changes for the OPCFG feature.
- Fixed build options incorrectly propagated when building with Clang and resolved resulting warnings.
- Resolved undefined references to
crc16_*
functions. - Fixed accelerator NUMA node setting via the
--node
parameter for Benchmarks framework. Previously, Benchmarks initialization and validation steps were mapped to the NUMA node of the calling process always, which could potentially result in theQPL_STS_INIT_WORK_QUEUES_NOT_AVAILABLE
error. - Fixed Huffman Only verification on
software_path
when the source size is greater than 4KB. - Removed temporary buffer used on Huffman Only decompression code path for BE16 that could lead to potential seg. fault.
- Fixed the error code for invalid distance symbol in
software_path
decompression. - Fixed generation of AECS Format-2 in tests that caused failure of ta_c_api_inflate_huffman_only.generated_data on
hardware_path
. - Updated initialization of Huffman table from another to fix failures in ll_huffman.compress_sw_decompress_hw_{high/default}_level.
- Fixed issues flagged by the static code analysis tool.
Known Limitations
- Intel QPL could be built from directly downloadable files (
.tar
,.tgz
) without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option. Since they require submodules that are not included in the archives by GitHub* during release creation. - During accelerator initialization on
hardware_path
and IAA 2.0, there is a small memory leak that would be resolved in one of the future releases. - Known test failures are listed below. Some tests only fail under certain conditions, which are noted in parentheses
- Functional tests:
- (
software_path
) ta_c_api_deflate_stateful.{dynamic/fixed/static}_{high/default}_verify - (
hardware_path
) ta_c_api_deflate_canned_indexing.default_level - (
hardware_path
on IAA 2.0) ta_c_api_deflate_index_extended.PerformOperation - (
hardware_path
) tn_c_api_expand.tn_rle_input_error_handling
- (
- Functional tests:
- Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
- Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when a header is too big to fit in the input buffer. - Dynamic Deflate on IAA 2.0 may show performance regression in throughput benchmarks using the asynchronous path.
- The implementation of
QPL_FLAG_CRC32C
is in progress.
Intel QPL v1.3.1
This is a patch release containing the following changes to v1.3.0:
Usability and Documentation
- Testing coverage and documentation improvements for dictionary compression functionality.
Bug Fixes
- Fixed job structure update for continuation on "Decompression Output Overflow" error when
software_path
is used. - Fixed multi chunk compression when destination buffer is insufficient and stored block is written into the output stream instead.
- Fixed incorrect error code returned in the case when no devices are available on the NUMA node specified/detected.
- Fixed incorrectly set offsets in
examples/low-level-api/compression_multi_chunk_example.cpp
andexamples/low-level-api/compression_static_multi_chunk_example.cpp
. - Fixed a few more issues flagged by the static code analysis tool.
Known Limitations
- Intel QPL could be built from directly downloadable files (
.tar
,.tgz
) without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option. Since they require submodules that are not included in the archives by GitHub* during release creation. - Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
- Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when a header is too big to fit in the input buffer. - Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
- Functional tests:
- ta_c_api_dictionary.dynamic_high_{stateless, stateful_decompression}
- (software_path
) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
- (hardware_path
) ta_c_api_deflate_canned_indexing.default_level
- (hardware_path
with asynchronous execution mode) ta_c_api_deflate_index_extended.PerformOperation
- (hardware_path
with specific test seeds) tn_c_api_expand.tn_rle_input_error_handling
- (hardware_path
if Generation 2 Minimum Capabilities are present) ta_c_api_inflate_huffman_only.generated_data
Intel QPL v1.3.0
Functionality
- Enabled support of IAA 2.0 for Huffman Only Decompression when verification is used.
- Enabled Compression with Dictionary for
hardware_path
supported on IAA 2.0. - Enabled support of IAA 2.0 WQ OPCFG Support feature for disabling/enabling operations at a work queue granularity.
- Added Page Fault handling mechanism for the case when IAA Block on Fault is off.
- Introduced zlib support for
hardware_path
.
Usability and Documentation
- Added documentation section on zlib and GZIP compatibility support for DEFLATE.
- Extended documentation section on Decompression Output Overflow error.
- Introduced documentation section on how Intel(R) QPL handles Page Faults.
- Created a Contributing Guide and Pull Request template.
- Added an example to produce intentional Decompression Output Overflow to demonstrate how it should be resolved on the user side.
- Refactored examples to print out specific error codes instead of throwing exceptions with generic messages.
- Expanded functional dictionary tests to cover all compression-level combinations.
- Enabled testing for dictionary utility functions.
- Added thread stress testing to test for compress/decompress with heavy multithreaded usage.
- Updated
requirements.txt
to the latest compatible tools required for building Intel(R) QPL documentation locally. - Updated Google* Benchmark dependency to 1.8.3.
Bug Fixes
- Fixed a segmentation fault in high-level DEFLATE compression on the AVX-512 code path.
- Fixed job structure update for continuation on Decompression Overflow case when synchronous execution with
hardware_path
is used. - Fixed various issues flagged by the static code analysis tool.
Known Limitations
- Intel QPL could be built from directly downloadable files (
.tar
,.tgz
) without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option. Since they require submodules that are not included in the archives by GitHub* during release creation. - Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
- Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when a header is too big to fit in the input buffer. - Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
- Functional tests:
- (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
- (hw) ta_c_api_deflate_canned_indexing.default_level
- (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
- (hw) tn_c_api_expand.tn_rle_input_error_handling
- Functional tests:
- Dynamic Deflate on IAA 2.0 may show performance regression in throughput benchmarks using the asynchronous path.
- When getting a decompression "Output Overflow" error on a
software_path,
resubmit the job from the beginning. Currently, it is an unrecoverable error onsoftware_path
, and continuation is not supported (ashardware_path
does).
Intel QPL v1.2.0
Functionality
- Partially enabled support of IAA 2.0 for existing functionality, some failures are expected for Huffman Only mode when hardware compression with verification is used.
- Enabled 1-pass and 2-pass Header Generations on IAA 2.0 for the Dynamic Deflate compression, that reduces latency for smaller sizes.
- Added Cyclic Redundancy Check (CRC) operation support to the Benchmarks Framework.
- Fixed a workaround for IAA 1.0 limitation with Big Endian 16 format in Huffman Only.
Usability and Documentation
- Added Library Architecture Overview diagram to the Introduction page.
- Extended returned status codes for Completion Record for more accessible issues reporting.
- Clarified the NUMA* support in the Benchmarks Framework documentation.
- Updated provided configuration files to always set Block on Fault and removed the Max Batch Size parameter not used on IAA.
- Updated the GoogleTest* submodule to v.1.13 release. The current QPL test framework is not compatible with previous GoogleTest* versions.
- Added new examples for multi-chunk compression, including fixed and static blocks.
- Updated the Installed package structure to comply with the Linux* OS file-system hierarchy.
- Added a link to the project with Java* bindings for QPL Low-Level C APIs.
- Clarified in the Documentation that the minimally tested platform is x86-64 CPU with Intel® Streaming SIMD Extensions 4.2 (instead of Intel® Advanced Vector Extensions 2).
Breaking Changes
- Updated the accel-config/libaccel-config dependency requirement to v4.0.
Bug Fixes
- Fixed the Compression Ratio calculation in the Benchmarks Framework to eliminate the rounding error.
- Fixed incorrect CTest* integration when QPL is used as a dependency in another project.
- Fixed the incorrect Linux* OS identification macro that could lead to build failures on some systems.
- Fixed build warnings and failures with GCC* 11, 12, 13.
- Fixed the incorrect Benchmarks Framework reporting of the NUMA* nodes when several nodes are available on a socket.
- Refactored the host part of the Hardware Path to rely on the platform identification and kernels dispatcher instead of directly calling AVX-512-optimized code.
- Fixed compatibility with the GZIP* format. Previously, the stream produced by QPL was correct but could trigger a warning due to the incorrect trailer information.
Known Limitations
- Intel QPL could be built from directly downloadable files (
.tar
,.tgz
) without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option. Since they require submodules that are not included in the archives by GitHub* during release creation. - Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
- Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when a header is too big to fit in the input buffer. - Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
- Functional tests:
- (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
- (hw) ta_c_api_deflate_canned_indexing.default_level
- (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
- (hw) tn_c_api_expand.tn_rle_input_error_handling
- Functional tests:
- Dynamic Deflate on IAA 2.0 may show performance regression in throughput benchmarks using the asynchronous path.
Thanks to the Contributors
Release includes contributions from the project team as well as @alexandraepan, @aekoroglu, @yaqi-zhao, @miguelinux.
Intel QPL v1.1.0
Usability and Documentation
- Improved examples by setting the execution path based on a command-line argument instead of hardcoding to use the Software Path.
- If threads sanitizing is enabled (with
-DSANITIZE_THREADS=ON
) when building Intel QPL, changed CMake version requirement to v3.23 or higher to avoid undefined pthread references. - Changed the job structure allocation model so that it depends on the provided execution path. The user may see significant reduction in job structure size when using Hardware Path.
- Fixed CMakeLists.txt so that starting from this release the QPL project could be easily integrated into other CMake-based projects using
find_package
. - Introduced the
-DQPL_BUILD_{TESTS, EXAMPLES}
option (set toON
by default).-DQPL_BUILD_TESTS=OFF
enables the user to build the library (without testing) from directly downloadable files (.tar
,.tgz
). - Fixed build warnings with
-DLOG_HW_INIT=ON
. - Removed
-DBLOCK_ON_FAULT=[OFF|ON]
from the documentation since the Block on Fault feature cannot be enabled/disabled through this build option. Users must useaccel-config
to enable/disable Block on Fault for each work queue.
Breaking Changes
- Changed the loading of the accel-config library from static loading to dynamic loading by default. Added a build option
-DDYNAMIC_LOADING_LIBACCEL_CONFIG=[OFF|ON]
to switch between dynamic loading and static loading. This build option is set toON
by default for dynamic loading. To compile a QPL application, users must add-ldl
with default dynamic loading (or use-laccel-config
if Intel QPL is built with-DDYNAMIC_LOADING_LIBACCEL_CONFIG=OFF
).
Bug Fixes
- Fixed gcc 11 build failures caused by missing headers.
- Fixed a race condition that might occur during hardware initialization. Users with heavy-threaded workloads might have experienced seg. fault or hang starting with QPL v0.3.0; the issue is addressed in this release.
Known issues/limitations
- Intel QPL could be built from directly downloadable files (
.tar
,.tgz
) only without tests and benchmark frameworks, using the-DQPL_BUILD_TESTS=OFF
build option, since they require submodules that are not included to the archives by GitHub during release creation. - Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
- Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when header is too big to fit in the input buffer. - Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
-
Functional tests:
- (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
- (hw) (async) ta_c_api_inflate_huffman_only.generated_data
- (hw) ta_c_api_deflate_canned_indexing.default_level
- (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
- (hw) (async) ta_c_api_huffman_only{_verify}.{dynamic/static}_be
- (hw) tn_c_api_expand.tn_rle_input_error_handling
-
Cross tests:
- ll_huffman.compress_hw_decompress_sw
- ll_huffman.compress_sw_decompress_hw_{high/default}_level
-
Intel QPL v1.0.0
Functionality
- Added Benchmark Framework with limited support; refer to the Benchmark Framework Guide in the documentation for details regarding what is supported and how it can be used
Usability and Documentation
- Fixed build warnings with GCC
- Added a new error status code
QPL_STS_JOB_NOT_SUBMITTED
, which will be returned if the job being checked/waited has not been submitted - Added
--qpl-tests-help
option for functional tests executable (located at<install_dir>/bin/tests
);--qpl-tests-help
lists all available test options specific to the library (e.g., execution path, synchronous or asynchronous mode)
Deprecated Functionality
- Removed support of High-Level C++ API from the library
- Removed support of experimental DWQ feature
Breaking Changes
- Removed
QPL_FLAG_NO_BUFFERING
and all references to it - Flags for using indexing mode changed from using
QPL_FLAG_NO_BUFFERING
toQPL_FLG_RND_ACCESS
Bug Fixes
- Fixed issue with extract on software path ending in the middle of a literal octa-group for some bit-widths
- Fixed issue that resulted in the wrong
total_out
value in theqpl_job
when an asynchronous canned mode compression was submitted and the reuse of aqpl_job
object from a previous unrelated job
Known issues/limitations
- Intel QPL cannot be built from direct downloadable files (
.tar
,.tgz
) since it has submodules that are not included to the archives by GitHub during release creation - Compression verification on the software path only works with indexing mode and data of size smaller than 32KB in other modes
- Inflate does not report the error code
QPL_STS_BIG_HEADER_ERR
when header is too big to fit in the input buffer - Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
-
Functional tests:
- (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
- (hw) (fails with certain test seeds only) tn_c_api_inflate.no_literal_lengths_code
- (hw) (async) ta_c_api_inflate_huffman_only.generated_data
- (hw) ta_c_api_deflate_canned_indexing.default_level
- (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
- (hw) (async) ta_c_api_huffman_only{_verify}.{dynamic/static}_be
- (hw) tn_c_api_expand.tn_rle_input_error_handling
-
Cross tests:
- ll_deflate.compress_hw_decompress_sw
- ll_huffman.compress_hw_decompress_sw
- ll_huffman.compress_sw_decompress_hw_{high/default}_level
-
Intel QPL v0.3.0
Usability and Documentation
- Changed
libaccel-config.so
to a build time dependency on Linux instead of loading it for Hardware Path execution at runtime, it is now required to add-laccel-config
when building the application with QPL - Fixed duplications in status codes documentation page in Developer Reference
Deprecated Functionality
- Dropped support of
qpl_op_set_membership
,qpl_op_find_unique
andqpl_op_rle_burst
analytic operations - Dropped support of zero compression:
qpl_op_z_compress{16, 32}
andqpl_op_z_decompress{16, 32}
Bug Fixes
- Changed accelerator dispatching to use lazy initialization with locks to ensure thread safety. Previous behavior might result in crashes on user's side when they fork a child process to submit job to the accelerator
- Fixed non-optimal Huffman Only compression when executing on Software Path
- Fixed incorrect mapping of accelerator status codes to library status codes which previously resulted in returning undocumented error to the user
Known issues/limitations
- Intel QPL cannot be built from direct downloadable files (.tar, .tgz) since it has submodules which are not included to the archives by GitHub during release creation
- Internal error code
QPL_STS_INTERNAL_ERROR
has been moved from error code 222 to error code 6. This could affect users having a hardcoded check for error 222 - On Linux
libaccel-config.so
must be placed in/usr/lib64/
to build and run Intel QPL, even if only the Software Path execution is used