Skip to content

Intel QPL v1.7.0

Latest
Compare
Choose a tag to compare
@mcao59 mcao59 released this 13 Dec 00:54
97cc01c

Functionality

  • Enhanced the Benchmarks Framework to incorporate the new QPL device selection mechanism introduced in the previous release.
  • Saved intermediate job states in the dynamic Deflate job to prevent duplicate work when executing with the asynchronous API on the Hardware Path and encountering the QPL_STS_QUEUES_ARE_BUSY_ERR error. In such cases, the job is resubmitted without repeating the already completed work. In v1.6.0 release, this functionality was enabled with the synchronous API.
  • [experimental feature] Added a mechanism to measure Intel IAA execution time in a single-threaded application with the synchronous API.

Usability and Documentation

  • Introduced a clang-format configuration file and formatted the entire codebase using clang-format 17.
  • Improved C++ compatibility by fixing field order mismatches when creating structures, initializing char* strings with literals, and removing unnecessary conversions between integers and enums.
  • Added a documentation note clarifying that QPL testing with datasets provided under tools/testdata requires a maximum transfer size of 2GB to avoid the QPL_STS_TRANSFER_SIZE_INVALID error code.
  • Updated documentation on the -DEFFICIENT_WAIT build option.
  • Enhanced the Introduction section of the QPL documentation, including adding useful links for the Intel® In-Memory Analytics Accelerator.
  • Extended testing to generate stored block insertion on the last job.
  • Made multiple updates to documentation and examples on qpl_get_safe_deflate_compression_buffer size usage for multi-chunk compression.
  • Improved distance code computation logic on the Software Path.

Deprecated Functionality

  • The Force Array Output Modification Feature has been deprecated on the Auto Path due to the lack of host fallback support. Use the Hardware Path instead.

Bug Fixes

  • Resolved build issues with Clang-17 caused by a missing header.
  • Corrected logic in qpl_check_job to prevent unintended host fallback instead of accelerator execution.
  • Fixed the compression verification step on the asynchronous path when a stored block occurs.
  • Implemented multiple fixes for the stored block insertion feature on both asynchronous and synchronous paths.
  • Prevented reprocessing when qpl_check_job or qpl_wait_job is called after submission.
  • Implemented multiple fixes for issues with index compression/decompression.
  • Fixed intermediate buffer incrementing for the select operation.
  • Initialized intermediate huffman table structure correctly to avoid garbage in the huffman table.
  • Implemented creation of the mapping CAM huffman decompression table.
  • Resolved the issue of never setting the accelerator context on the Auto Path.
  • Introduced immediate fallback to host execution for the specific case of Huffman-only BE16 decompression on the Auto Path.

Known Limitations

  • Intel(R) QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation.

  • Known test failures are listed below. Some tests fail only under certain conditions, which are noted in parentheses.

    • Functional tests:
      • (software_path, auto_path only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify
      • (software_path, auto_path) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify
  • Compression verification on the qpl_path_software works only with indexing mode and data of size smaller than 32KB in other modes.

  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.

  • The implementation of QPL_FLAG_CRC32C is in progress.

  • When using qpl_path_hardware, the compression and decompression with indexing mode on IAA 2.0 are limited to data sizes smaller than 4KB.

  • The Force Array Output Modification feature is enabled only for qpl_path_hardware and Intel IAA 2.0 (and later). In the case of qpl_path_auto, an error code QPL_STS_NOT_SUPPORTED is returned as no fallback is available currently.

Thanks to the Contributors

The release includes contributions from the project team and @fwph, @Permanence-AI-Coder.