Skip to content

What is the main function of nvcomp LZ4 codec and what scene is it used for? [QST] #5382

Answered by revans2
chenrui17 asked this question in General
Discussion options

You must be logged in to vote

This is all about shuffle. Spark, by default, compresses all shuffle data before it is written out to disk, although it is configurable.

When using the default shuffle implementation we still use the CPU compression algorithm. This is to be able to match that functionality when using the UCX based shuffle plugin, where we want to avoid going back to the CPU if possible.

There are two places where this can help us.

  1. Reduces GPU memory usage. We compress the data before we cache it on the GPU. If the compression ratio is good then we can store more data in GPU memory and not have to spill it to host memory or disk.
  2. The other is to use more computation to offset limited bandwidth. The amount…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
3 participants
Converted from issue

This discussion was converted from issue #877 on April 28, 2022 23:09.