Skip to content

Releases: intel/qpl

Intel QPL v1.7.0

13 Dec 00:54
97cc01c
Compare
Choose a tag to compare

Functionality

  • Enhanced the Benchmarks Framework to incorporate the new QPL device selection mechanism introduced in the previous release.
  • Saved intermediate job states in the dynamic Deflate job to prevent duplicate work when executing with the asynchronous API on the Hardware Path and encountering the QPL_STS_QUEUES_ARE_BUSY_ERR error. In such cases, the job is resubmitted without repeating the already completed work. In v1.6.0 release, this functionality was enabled with the synchronous API.
  • [experimental feature] Added a mechanism to measure Intel IAA execution time in a single-threaded application with the synchronous API.

Usability and Documentation

  • Introduced a clang-format configuration file and formatted the entire codebase using clang-format 17.
  • Improved C++ compatibility by fixing field order mismatches when creating structures, initializing char* strings with literals, and removing unnecessary conversions between integers and enums.
  • Added a documentation note clarifying that QPL testing with datasets provided under tools/testdata requires a maximum transfer size of 2GB to avoid the QPL_STS_TRANSFER_SIZE_INVALID error code.
  • Updated documentation on the -DEFFICIENT_WAIT build option.
  • Enhanced the Introduction section of the QPL documentation, including adding useful links for the Intel® In-Memory Analytics Accelerator.
  • Extended testing to generate stored block insertion on the last job.
  • Made multiple updates to documentation and examples on qpl_get_safe_deflate_compression_buffer size usage for multi-chunk compression.
  • Improved distance code computation logic on the Software Path.

Deprecated Functionality

  • The Force Array Output Modification Feature has been deprecated on the Auto Path due to the lack of host fallback support. Use the Hardware Path instead.

Bug Fixes

  • Resolved build issues with Clang-17 caused by a missing header.
  • Corrected logic in qpl_check_job to prevent unintended host fallback instead of accelerator execution.
  • Fixed the compression verification step on the asynchronous path when a stored block occurs.
  • Implemented multiple fixes for the stored block insertion feature on both asynchronous and synchronous paths.
  • Prevented reprocessing when qpl_check_job or qpl_wait_job is called after submission.
  • Implemented multiple fixes for issues with index compression/decompression.
  • Fixed intermediate buffer incrementing for the select operation.
  • Initialized intermediate huffman table structure correctly to avoid garbage in the huffman table.
  • Implemented creation of the mapping CAM huffman decompression table.
  • Resolved the issue of never setting the accelerator context on the Auto Path.
  • Introduced immediate fallback to host execution for the specific case of Huffman-only BE16 decompression on the Auto Path.

Known Limitations

  • Intel(R) QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation.

  • Known test failures are listed below. Some tests fail only under certain conditions, which are noted in parentheses.

    • Functional tests:
      • (software_path, auto_path only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify
      • (software_path, auto_path) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify
  • Compression verification on the qpl_path_software works only with indexing mode and data of size smaller than 32KB in other modes.

  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.

  • The implementation of QPL_FLAG_CRC32C is in progress.

  • When using qpl_path_hardware, the compression and decompression with indexing mode on IAA 2.0 are limited to data sizes smaller than 4KB.

  • The Force Array Output Modification feature is enabled only for qpl_path_hardware and Intel IAA 2.0 (and later). In the case of qpl_path_auto, an error code QPL_STS_NOT_SUPPORTED is returned as no fallback is available currently.

Thanks to the Contributors

The release includes contributions from the project team and @fwph, @Permanence-AI-Coder.

Intel QPL v1.6.0

17 Jul 22:31
c26fb69
Compare
Choose a tag to compare

Functionality

  • Introduced a new internal submission mechanism for platforms based on Linux* OS kernel versions where MMAP is no longer permitted. For more details, refer to the Intel Security Advisory. When MMAP is unavailable, the write system call is used instead. This may introduce additional overhead for small data sizes (4KB and smaller) in the Inflate functionality, but no performance implications are expected for larger data sizes or Deflate.
  • Updated the QPL device search mechanism to a new default behavior. Now, the platforms with Sub-NUMA clustering configured such that not all NUMA nodes have an accelerator instance can utilize any IAA instance from the same socket for execution unless specified by the user. You still can restrict device selection to a specific NUMA node of the current thread by specifying QPL_DEVICE_NUMA_ID_CURRENT, or to a specific NUMA node by setting job->numa_id = <numa_node_id>. Additionally, you can extend the entire system by setting QPL_DEVICE_NUMA_ID_ANY.
  • Added support for host fallback in the asynchronous API when using the Auto Path feature.
  • Implemented an internal mechanism to save intermediate job states in the dynamic Deflate job. This feature prevents duplicate work when executing with the synchronous API on the Hardware Path and encountering the QPL_STS_QUEUES_ARE_BUSY_ERR error. In such cases, the job is resubmitted without repeating the already completed work.

Usability and Documentation

  • Added support for Canned mode in QPL Benchmarks Frameworks.
  • Optimized memory usage and reduced startup time for benchmarks when utilizing an exact filter.
  • Introduced a new build option -DQPL_USE_CLANG_TIDY={ON,OFF} to enable QPL to build with clang-tidy checks. Clang-tidy support is limited to Linux* OS only and requires building QPL with the Clang* compiler. Additionally, introduced a configuration file for clang-tidy and refactored QPL to comply with the introduced clang-tidy configuration file.
  • Added a new example demonstrating the utilization of dictionary compression with the Hardware Path for compression and the Software Path for decompression.
  • Added new test cases for Select, Scan, and Extract operations to validate the functionality of Force Array Output Modification.
  • Expanded the bad argument scenarios for the Force Array Output Modification tests to include additional cases for the Software Path.
  • Added new tests to validate error handling for bad arguments when submitting jobs on the Hardware Path and Auto Path.

Deprecated Functionality

  • Deprecated support for canned mode with indexing on the Software Path to align with the Hardware Path.

Bug Fixes

  • Resolved the issue with compression verification when utilizing IAA 2.0.
  • Corrected the test setup for auto_path in tb_c_api_deflate_with_dictionary.level_none, tb_c_api_deflate_with_dictionary.hw_multi_chunk, and tn_c_api_deflate.dynamic/fixed/static}_default_stored_block_overflow.
  • Added an execution path check to ensure proper handling of unsupported paths in the Force Array Output Modification.
  • Resolved potential undefined behavior by fixing uninitialized pointers in the canned_one_chuck_hw_vs_sw.cpp test.
  • Removed tests related to the unsupported Software Path for the canned mode with indexing.
  • Fixed invalid parquet generation for tn_c_api_expand.tn_rle_input_error_handling.

Known Limitations

  • Intel(R) QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation.

  • Known test failures are listed below. Some tests only fail under certain conditions, which are noted in parentheses

    • Functional tests:
      • (software_path, auto_path only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify
      • (software_path, auto_path) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify
      • (hardware_path, auto_path on IAA 2.0) ta_c_api_deflate_index_extended.PerformOperation
      • (auto_path) ta_c_api_huffman_only{_verify./.}{dynamic/static}_be
      • (auto_path) ta_c_api_inflate_huffman_only.generated_data
      • (auto_path) ta_c_api_deflate_index.{dynamic/static}_blocks_default_level_verify
      • (auto_path) tb_c_api_expand.source_errors
      • (auto_path) ta_c_api_deflate_inflate_canned_in_loops.default_level
  • Compression verification on the qpl_path_software works only with indexing mode and data of size smaller than 32KB in other modes.

  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.

  • The implementation of QPL_FLAG_CRC32C is in progress.

  • When using qpl_path_hardware, the compression and decompression with indexing mode on IAA 2.0 are limited to data sizes smaller than 4KB.

Thanks to the Contributors

The release includes contributions from the project team and @aekoroglu, @fwph, and @Permanence-AI-Coder.

Intel QPL v1.5.0

30 Apr 20:39
28df3bf
Compare
Choose a tag to compare

Functionality

  • Introduced the new QPL_FLAG_FORCE_ARRAY_OUTPUT flag to enable the Force Array Output Modification feature. This feature is supported on Intel® In-Memory Analytics Accelerator 2.0 and allows filter operation outputs to be received as an array with a defined bit width when the output bit width is 1.
  • Enabled host fallback for synchronous API when Auto Path is used. Note that Auto Path with asynchronous execution is not yet supported.
  • Enabled the building of QPL as a shared library using the -DQPL_LIBRARY_TYPE=SHARED build flag.
  • Added a pkg-config support (see <install_dir>/lib/pkgconfig/qpl.pc file) for shared library built with dynamic loading of libaccel-config.

Usability and Documentation

  • Extended examples with a recipe for using idxd-config APIs to query accelerator configuration information relevant to QPL usage.
  • Revised and improved the examples for scan functionality. Refer to examples/low-level-api/scan_for_specific_value_example.cpp and examples/low-level-api/scan_for_elements_in_range_example.cpp.
  • Added a new example, expand_with_force_array_output_mod_example, to demonstrate the usage of the Force Array Output Modification feature.
  • Updated documentation to describe the new Force Array Output Modification feature and its interaction with Output Bit Width Modification.
  • Updated System Requirements documentation section for using IAA 2.0 (Linux kernel version 6.3 or later is required).
  • Extended testing suite to cover Filter operations for the case when Block on Fault is set to OFF.
  • Extended functional test to cover dictionary compression and decompression that reuses the job structure.
  • Initialization tests were removed as outdated.

Bug Fixes

  • Fixed possible symbol conflict when QPL is used in the same application with ISA-L.
  • Fixed a possible "_FORTIFY_SOURCE redefined" build warning/error. Some GCC* builds could internally set _FORTIFY_SOURCE, which could have resulted in a QPL build error.
  • Fixed an issue in qpl_gather_deflate_statistics that resulted in a lower compression ratio on qpl_path_auto.
  • Fixed low compression ratio issues on qpl_path_hardware when compressing with a user-provided dictionary smaller than IAA's reserved dictionary size.
  • Fixed an issue where the incorrect error code QPL_STS_INIT_WORK_QUEUES_NOT_AVAILABLE was returned when using IAA 2.0 with older Linux kernel versions.
  • Fixed an issue with Fixed mode dictionary decompression on IAA when the job is re-used. Previously, error code 222 was returned.
  • Fixed possible nullptr dereference when using Canned mode compression on a qpl_path_hardware.
  • Fixed possible data corruption in dictionary decompression on a qpl_path_hardware.

Known Limitations

  • Intel(R) QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation.
  • Known test failures are listed below. Some tests only fail under certain conditions, which are noted in parentheses
    • Functional tests:
      • (software_path, auto_path only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify
      • (software_path, auto_path) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify
      • (hardware_path, auto_path) ta_c_api_deflate_canned_indexing.default_level
      • (hardware_path, auto_path on IAA 2.0) ta_c_api_deflate_index_extended.PerformOperation
      • (hardware_path, auto_path) tn_c_api_expand.tn_rle_input_error_handling
      • (auto_path) ta_c_api_deflate_canned_indexing.high_level
      • (auto_path) ta_c_api_huffman_only{_verify./.}{dynamic/static}_be
      • (auto_path) ta_c_api_inflate_huffman_only.generated_data
      • (auto_path) ta_c_api_deflate_index.{dynamic/static}_blocks_default_level_verify
      • (auto_path) tb_c_api_expand.source_errors
      • (auto_path) tb_c_api_deflate_with_dictionary.level_none
      • (auto_path) tb_c_api_deflate_with_dictionary.hw_multi_chunk
      • (auto_path) tn_c_api_deflate.{dynamic/fixed/static}_default_stored_block_overflow
      • (auto_path) ta_c_api_deflate_inflate_canned_in_loops.default_level
  • Compression verification on the qpl_path_software works only with indexing mode and data of size smaller than 32KB in other modes.
  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.
  • The implementation of QPL_FLAG_CRC32C is in progress.

Thanks to the Contributors

Release includes contributions from the project team as well as @aepanchi and @miguelinux.

Intel QPL v1.4.0

31 Jan 22:12
d4715e0
Compare
Choose a tag to compare

Functionality

Enabled the canned mode compression with the dictionary for hardware_path supported on IAA 2.0. Note that canned mode decompression with a dictionary on hardware_path is not supported, and software_path can be used as an alternative.

Usability and Documentation

  • Added example for canned compression with a dictionary.
  • Clarified the documentation about output modification for the expand operation.
  • Extended functionality testing for the cases when the IAA Block on Fault feature is set to OFF.

Bug Fixes

  • Fixed issue in high-level fixed mode compression on software_path. Previously, the job may be completed with QPL_STS_OK when only partial source data is processed.
  • Fixed issue in high-level dictionary mode compression. Previously, loss of data would occur during the end of compression.
  • Fixed block header decompression for indexing mode on asynchronous path.
  • Fixed performance regression that could appear on IAA 2.0 due to changes for the OPCFG feature.
  • Fixed build options incorrectly propagated when building with Clang and resolved resulting warnings.
  • Resolved undefined references to crc16_* functions.
  • Fixed accelerator NUMA node setting via the --node parameter for Benchmarks framework. Previously, Benchmarks initialization and validation steps were mapped to the NUMA node of the calling process always, which could potentially result in the QPL_STS_INIT_WORK_QUEUES_NOT_AVAILABLE error.
  • Fixed Huffman Only verification on software_path when the source size is greater than 4KB.
  • Removed temporary buffer used on Huffman Only decompression code path for BE16 that could lead to potential seg. fault.
  • Fixed the error code for invalid distance symbol in software_path decompression.
  • Fixed generation of AECS Format-2 in tests that caused failure of ta_c_api_inflate_huffman_only.generated_data on hardware_path.
  • Updated initialization of Huffman table from another to fix failures in ll_huffman.compress_sw_decompress_hw_{high/default}_level.
  • Fixed issues flagged by the static code analysis tool.

Known Limitations

  • Intel QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. Since they require submodules that are not included in the archives by GitHub* during release creation.
  • During accelerator initialization on hardware_path and IAA 2.0, there is a small memory leak that would be resolved in one of the future releases.
  • Known test failures are listed below. Some tests only fail under certain conditions, which are noted in parentheses
    • Functional tests:
      • (software_path) ta_c_api_deflate_stateful.{dynamic/fixed/static}_{high/default}_verify
      • (hardware_path) ta_c_api_deflate_canned_indexing.default_level
      • (hardware_pathon IAA 2.0) ta_c_api_deflate_index_extended.PerformOperation
      • (hardware_path) tn_c_api_expand.tn_rle_input_error_handling
  • Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.
  • Dynamic Deflate on IAA 2.0 may show performance regression in throughput benchmarks using the asynchronous path.
  • The implementation of QPL_FLAG_CRC32C is in progress.

Intel QPL v1.3.1

28 Oct 22:16
a61bdd8
Compare
Choose a tag to compare

This is a patch release containing the following changes to v1.3.0:

Usability and Documentation

  • Testing coverage and documentation improvements for dictionary compression functionality.

Bug Fixes

  • Fixed job structure update for continuation on "Decompression Output Overflow" error when software_path is used.
  • Fixed multi chunk compression when destination buffer is insufficient and stored block is written into the output stream instead.
  • Fixed incorrect error code returned in the case when no devices are available on the NUMA node specified/detected.
  • Fixed incorrectly set offsets in examples/low-level-api/compression_multi_chunk_example.cpp and examples/low-level-api/compression_static_multi_chunk_example.cpp.
  • Fixed a few more issues flagged by the static code analysis tool.

Known Limitations

  • Intel QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. Since they require submodules that are not included in the archives by GitHub* during release creation.
  • Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.
  • Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
      - Functional tests:
        - ta_c_api_dictionary.dynamic_high_{stateless, stateful_decompression}
        - (software_path) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
        - (hardware_path) ta_c_api_deflate_canned_indexing.default_level
        - (hardware_path with asynchronous execution mode) ta_c_api_deflate_index_extended.PerformOperation
        - (hardware_path with specific test seeds) tn_c_api_expand.tn_rle_input_error_handling
        - (hardware_path if Generation 2 Minimum Capabilities are present) ta_c_api_inflate_huffman_only.generated_data

Intel QPL v1.3.0

09 Oct 15:44
b935277
Compare
Choose a tag to compare

Functionality

  • Enabled support of IAA 2.0 for Huffman Only Decompression when verification is used.
  • Enabled Compression with Dictionary for hardware_path supported on IAA 2.0.
  • Enabled support of IAA 2.0 WQ OPCFG Support feature for disabling/enabling operations at a work queue granularity.
  • Added Page Fault handling mechanism for the case when IAA Block on Fault is off.
  • Introduced zlib support for hardware_path.

Usability and Documentation

  • Added documentation section on zlib and GZIP compatibility support for DEFLATE.
  • Extended documentation section on Decompression Output Overflow error.
  • Introduced documentation section on how Intel(R) QPL handles Page Faults.
  • Created a Contributing Guide and Pull Request template.
  • Added an example to produce intentional Decompression Output Overflow to demonstrate how it should be resolved on the user side.
  • Refactored examples to print out specific error codes instead of throwing exceptions with generic messages.
  • Expanded functional dictionary tests to cover all compression-level combinations.
  • Enabled testing for dictionary utility functions.
  • Added thread stress testing to test for compress/decompress with heavy multithreaded usage.
  • Updated requirements.txt to the latest compatible tools required for building Intel(R) QPL documentation locally.
  • Updated Google* Benchmark dependency to 1.8.3.

Bug Fixes

  • Fixed a segmentation fault in high-level DEFLATE compression on the AVX-512 code path.
  • Fixed job structure update for continuation on Decompression Overflow case when synchronous execution with hardware_path is used.
  • Fixed various issues flagged by the static code analysis tool.

Known Limitations

  • Intel QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. Since they require submodules that are not included in the archives by GitHub* during release creation.
  • Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.
  • Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
    • Functional tests:
      • (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
      • (hw) ta_c_api_deflate_canned_indexing.default_level
      • (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
      • (hw) tn_c_api_expand.tn_rle_input_error_handling
  • Dynamic Deflate on IAA 2.0 may show performance regression in throughput benchmarks using the asynchronous path.
  • When getting a decompression "Output Overflow" error on a software_path, resubmit the job from the beginning. Currently, it is an unrecoverable error on software_path, and continuation is not supported (as hardware_path does).

Intel QPL v1.2.0

10 Jul 23:40
faaf193
Compare
Choose a tag to compare

Functionality

  • Partially enabled support of IAA 2.0 for existing functionality, some failures are expected for Huffman Only mode when hardware compression with verification is used.
  • Enabled 1-pass and 2-pass Header Generations on IAA 2.0 for the Dynamic Deflate compression, that reduces latency for smaller sizes.
  • Added Cyclic Redundancy Check (CRC) operation support to the Benchmarks Framework.
  • Fixed a workaround for IAA 1.0 limitation with Big Endian 16 format in Huffman Only.

Usability and Documentation

  • Added Library Architecture Overview diagram to the Introduction page.
  • Extended returned status codes for Completion Record for more accessible issues reporting.
  • Clarified the NUMA* support in the Benchmarks Framework documentation.
  • Updated provided configuration files to always set Block on Fault and removed the Max Batch Size parameter not used on IAA.
  • Updated the GoogleTest* submodule to v.1.13 release. The current QPL test framework is not compatible with previous GoogleTest* versions.
  • Added new examples for multi-chunk compression, including fixed and static blocks.
  • Updated the Installed package structure to comply with the Linux* OS file-system hierarchy.
  • Added a link to the project with Java* bindings for QPL Low-Level C APIs.
  • Clarified in the Documentation that the minimally tested platform is x86-64 CPU with Intel® Streaming SIMD Extensions 4.2 (instead of Intel® Advanced Vector Extensions 2).

Breaking Changes

  • Updated the accel-config/libaccel-config dependency requirement to v4.0.

Bug Fixes

  • Fixed the Compression Ratio calculation in the Benchmarks Framework to eliminate the rounding error.
  • Fixed incorrect CTest* integration when QPL is used as a dependency in another project.
  • Fixed the incorrect Linux* OS identification macro that could lead to build failures on some systems.
  • Fixed build warnings and failures with GCC* 11, 12, 13.
  • Fixed the incorrect Benchmarks Framework reporting of the NUMA* nodes when several nodes are available on a socket.
  • Refactored the host part of the Hardware Path to rely on the platform identification and kernels dispatcher instead of directly calling AVX-512-optimized code.
  • Fixed compatibility with the GZIP* format. Previously, the stream produced by QPL was correct but could trigger a warning due to the incorrect trailer information.

Known Limitations

  • Intel QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. Since they require submodules that are not included in the archives by GitHub* during release creation.
  • Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.
  • Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
    • Functional tests:
      • (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
      • (hw) ta_c_api_deflate_canned_indexing.default_level
      • (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
      • (hw) tn_c_api_expand.tn_rle_input_error_handling
  • Dynamic Deflate on IAA 2.0 may show performance regression in throughput benchmarks using the asynchronous path.

Thanks to the Contributors

Release includes contributions from the project team as well as @alexandraepan, @aekoroglu, @yaqi-zhao, @miguelinux.

Intel QPL v1.1.0

22 Feb 17:02
9f17564
Compare
Choose a tag to compare

Usability and Documentation

  • Improved examples by setting the execution path based on a command-line argument instead of hardcoding to use the Software Path.
  • If threads sanitizing is enabled (with -DSANITIZE_THREADS=ON) when building Intel QPL, changed CMake version requirement to v3.23 or higher to avoid undefined pthread references.
  • Changed the job structure allocation model so that it depends on the provided execution path. The user may see significant reduction in job structure size when using Hardware Path.
  • Fixed CMakeLists.txt so that starting from this release the QPL project could be easily integrated into other CMake-based projects using find_package.
  • Introduced the -DQPL_BUILD_{TESTS, EXAMPLES} option (set to ON by default). -DQPL_BUILD_TESTS=OFF enables the user to build the library (without testing) from directly downloadable files (.tar, .tgz).
  • Fixed build warnings with -DLOG_HW_INIT=ON.
  • Removed -DBLOCK_ON_FAULT=[OFF|ON] from the documentation since the Block on Fault feature cannot be enabled/disabled through this build option. Users must use accel-config to enable/disable Block on Fault for each work queue.
     

Breaking Changes

  • Changed the loading of the accel-config library from static loading to dynamic loading by default. Added a build option -DDYNAMIC_LOADING_LIBACCEL_CONFIG=[OFF|ON] to switch between dynamic loading and static loading. This build option is set to ON by default for dynamic loading. To compile a QPL application, users must add -ldl with default dynamic loading (or use -laccel-config if Intel QPL is built with -DDYNAMIC_LOADING_LIBACCEL_CONFIG=OFF).

Bug Fixes

  • Fixed gcc 11 build failures caused by missing headers.
  • Fixed a race condition that might occur during hardware initialization. Users with heavy-threaded workloads might have experienced seg. fault or hang starting with QPL v0.3.0; the issue is addressed in this release.

Known issues/limitations

  • Intel QPL could be built from directly downloadable files (.tar, .tgz) only without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option, since they require submodules that are not included to the archives by GitHub during release creation.
  • Compression verification on the software path works only with indexing mode and data of size smaller than 32KB in other modes.
  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when header is too big to fit in the input buffer.
  • Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
    • Functional tests:

      • (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
      • (hw) (async) ta_c_api_inflate_huffman_only.generated_data
      • (hw) ta_c_api_deflate_canned_indexing.default_level
      • (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
      • (hw) (async) ta_c_api_huffman_only{_verify}.{dynamic/static}_be
      • (hw) tn_c_api_expand.tn_rle_input_error_handling
    • Cross tests:

      • ll_huffman.compress_hw_decompress_sw
      • ll_huffman.compress_sw_decompress_hw_{high/default}_level

Intel QPL v1.0.0

19 Dec 21:03
d75a29d
Compare
Choose a tag to compare

Functionality

  • Added Benchmark Framework with limited support; refer to the Benchmark Framework Guide in the documentation for details regarding what is supported and how it can be used

Usability and Documentation

  • Fixed build warnings with GCC
  • Added a new error status code QPL_STS_JOB_NOT_SUBMITTED, which will be returned if the job being checked/waited has not been submitted
  • Added --qpl-tests-help option for functional tests executable (located at <install_dir>/bin/tests); --qpl-tests-help lists all available test options specific to the library (e.g., execution path, synchronous or asynchronous mode)

Deprecated Functionality

  • Removed support of High-Level C++ API from the library
  • Removed support of experimental DWQ feature

Breaking Changes

  • Removed QPL_FLAG_NO_BUFFERING and all references to it
  • Flags for using indexing mode changed from using QPL_FLAG_NO_BUFFERING to QPL_FLG_RND_ACCESS

Bug Fixes

  • Fixed issue with extract on software path ending in the middle of a literal octa-group for some bit-widths
  • Fixed issue that resulted in the wrong total_out value in the qpl_job when an asynchronous canned mode compression was submitted and the reuse of a qpl_job object from a previous unrelated job

Known issues/limitations

  • Intel QPL cannot be built from direct downloadable files (.tar, .tgz) since it has submodules that are not included to the archives by GitHub during release creation
  • Compression verification on the software path only works with indexing mode and data of size smaller than 32KB in other modes
  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when header is too big to fit in the input buffer
  • Known test failures are listed below. Some tests only fails under certain conditions, which are noted in parentheses
    • Functional tests:

      • (sw) ta_c_api_deflat.{dynamic/fixed/static}_{high/default}_verify_stateful_compression
      • (hw) (fails with certain test seeds only) tn_c_api_inflate.no_literal_lengths_code
      • (hw) (async) ta_c_api_inflate_huffman_only.generated_data
      • (hw) ta_c_api_deflate_canned_indexing.default_level
      • (hw) (async) ta_c_api_deflate_index_extended.PerformOperation
      • (hw) (async) ta_c_api_huffman_only{_verify}.{dynamic/static}_be
      • (hw) tn_c_api_expand.tn_rle_input_error_handling
    • Cross tests:

      • ll_deflate.compress_hw_decompress_sw
      • ll_huffman.compress_hw_decompress_sw
      • ll_huffman.compress_sw_decompress_hw_{high/default}_level

Intel QPL v0.3.0

16 Nov 22:44
becb7a1
Compare
Choose a tag to compare

Usability and Documentation

  • Changed libaccel-config.so to a build time dependency on Linux instead of loading it for Hardware Path execution at runtime, it is now required to add -laccel-config when building the application with QPL
  • Fixed duplications in status codes documentation page in Developer Reference

Deprecated Functionality

  • Dropped support of qpl_op_set_membership, qpl_op_find_unique and qpl_op_rle_burst analytic operations
  • Dropped support of zero compression: qpl_op_z_compress{16, 32} and qpl_op_z_decompress{16, 32}

Bug Fixes

  • Changed accelerator dispatching to use lazy initialization with locks to ensure thread safety. Previous behavior might result in crashes on user's side when they fork a child process to submit job to the accelerator
  • Fixed non-optimal Huffman Only compression when executing on Software Path
  • Fixed incorrect mapping of accelerator status codes to library status codes which previously resulted in returning undocumented error to the user

Known issues/limitations

  • Intel QPL cannot be built from direct downloadable files (.tar, .tgz) since it has submodules which are not included to the archives by GitHub during release creation
  • Internal error code QPL_STS_INTERNAL_ERROR has been moved from error code 222 to error code 6. This could affect users having a hardcoded check for error 222
  • On Linux libaccel-config.so must be placed in /usr/lib64/ to build and run Intel QPL, even if only the Software Path execution is used