Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add internal API for converting ZSTD_Sequence into seqStore #2715

Merged
merged 1 commit into from
Jun 24, 2021

Conversation

senhuang42
Copy link
Contributor

Actually, it turns out this functionality already exists, but we just add a small wrapper around it to provide a clean interface.
Currently, on silesia.tar with sequences generated from compression level 3 and ZSTD_generateSequences(), ZSTD_convertBlockSequencesToSeqStore runs at around 2100MB/s averaged across all blocks. Most of this time seems to be spent in ZSTD_finalizeOffCode().

More importantly though, I'll specify a general set of guidelines for dealing with hardware accelerated matchfinders and how they should be integrated into the library at a per-block level, and suggestions are welcome here.

Generally, a hardware-accelerated matchfinder must adhere to the below function signature, storing its result in an array of ZSTD_Sequence.

// Generic function signature for hardware matchfinders.
// Accepts a void* pointer for a "bag" of parameters that the matchfinder may use,
// possibly derived from the ZSTD_CCtx parameters.
//
// As an example, one could define a function
// size_t ZSTD_accelerated_findMatches(ZSTD_Sequence* sequences, size_t sequencesCapacity,
//                               void* params, const void* src, size_t srcSize);
//
// Returns number of sequences generated, storing the result in `sequences`, or a zstd error.
typedef size_t (*ZSTD_hardwareMatchFinder) 
     (ZSTD_Sequence* sequences, size_t sequencesCapacity, void* params,
      const void* src, size_t srcSize);

The reasoning being that then down the line, we can then define the following function that could potentially select between multiple accelerated matchfinders:

// This function selects the final hardware match finder used, depending on the
// parameters in the ZSTD_CCtx. 
//
// ZSTD_selectHardwareMatchFinder() then will return ZSTD_accelerated_findMatches.
ZSTD_hardwareMatchFinder ZSTD_selectHardwareMatchFinder(const ZSTD_CCtx* zc);

And finally, the code could be integrated like something along these lines, in ZSTD_compressBlock_internal() (and of course, a first implementation can hard-code a lot of these dynamic decisions for the purposes of testing).

static size_t ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
                                        void* dst, size_t dstCapacity,
                                        const void* src, size_t srcSize, U32 frame)
{
    /* This the upper bound for the length of an rle block.
     * This isn't the actual upper bound. Finding the real threshold
     * needs further investigation.
     */
    const U32 rleMaxLength = 25;
    size_t cSize;
    const BYTE* ip = (const BYTE*)src;
    BYTE* op = (BYTE*)dst;
    DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%u, dictLimit=%u, nextToUpdate=%u)",
                (unsigned)dstCapacity, (unsigned)zc->blockState.matchState.window.dictLimit,
                (unsigned)zc->blockState.matchState.nextToUpdate);
                
    // HARDWARE ACCELERATED MATCHFINDING PATH HERE
    // ZSTD_useHardwareAccelerator() is a hypothetical function that determines
    // whether we use a hardware-accelerated approach for matchfinder, depending
    // on factors such as compression parameters and whatnot. The decision to use a hardware accelerator
    // could be predetermined/finalized during parameter initialization, and stored as a variable in the cctx.
    if (ZSTD_useHardwareAccelerator(zc)) {
        // Now, select a hardware matchfinder, based on parameters in ZSTD_CCtx
        ZSTD_hardwareMatchFinder matchFinder = ZSTD_selectHardwareMatchFinder(zc);
        
        // Reset the existing seqStore
        ZSTD_resetSeqStore(&cctx->seqStore);

        // Function pointer that delegates to the accelerated matchfinder to generate sequences.
        // `params` can be a custom struct of all required parameters for the particular matchfinder
        // `zc->hardwareSequences` is presumed already allocated and `zc->hardwareSequencesCapacity` is 
        //  already determined, likely during the decision to use hardware accelerated match-finding
        //  hardware acceleration during parameter finalization.
        size_t const nbSeqs = matchFinder(zc->hardwareSequences, zc->hardwareSequencesCapacity, &params, src, srcSize);
        
        // Generated sequences passed to new API, which gives us our final `zc->seqStore`
        FORWARD_IF_ERROR(ZSTD_convertBlockSequencesToSeqStore(...), "");
    } else {
        const size_t bss = ZSTD_buildSeqStore(zc, src, srcSize);
        FORWARD_IF_ERROR(bss, "ZSTD_buildSeqStore failed");
        if (bss == ZSTDbss_noCompress) { cSize = 0; goto out; }
    }
    ...

@senhuang42 senhuang42 merged commit 45d707e into facebook:dev Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants