-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[contrib] Support seek table-only API #2113
Conversation
Memory constrained use cases that manage multiple archives benefit from retaining multiple archive seek tables without retaining a ZSTD_seekable instance for each. * New opaque type for seek table: ZSTD_seekTable. * ZSTD_seekable_copySeekTable() supports copying seek table out of a ZSTD_seekable. * ZSTD_seekTable_[eachSeekTableOp]() defines seek table API that mirrors existing seek table operations. * Existing ZSTD_seekable_[eachSeekTableOp]() retained; they delegate to ZSTD_seekTable the variant. These changes allow the above-mentioned use cases to initialize a ZSTD_seekable, extract its ZSTD_seekTable, then throw the ZSTD_seekable away to save memory. Standard ZSTD operations can then be used to decompress frames based on seek table offsets. The copy and delegate patterns are intended to minimize impact on existing code and clients. Using copy instead of move for the infrequent operation extracting a seek table ensures that the extraction does not render the ZSTD_seekable useless. Delegating to *new* seek table-oriented APIs ensures that this is not a breaking change for existing clients while supporting all meaningful operations that depend only on seek table data.
Hi @mdittmer! Thank you for your pull request and welcome to our community.We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
CC @Cyan4973 |
size_t entriesSize = sizeof(seekEntry_t) * (zs->seekTable.tableLen + 1); | ||
seekEntry_t* entries = (seekEntry_t*)malloc(entriesSize); | ||
if (!entries) { | ||
free(entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you meant free(st);
?
Otherwise, there is leak on following return
.
{ | ||
ZSTD_seekTable* st = malloc(sizeof(ZSTD_seekTable)); | ||
if (!st) { | ||
free(st); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor :
if st == NULL
, there is no need to free()
it.
@@ -29,6 +29,7 @@ extern "C" { | |||
|
|||
typedef struct ZSTD_seekable_CStream_s ZSTD_seekable_CStream; | |||
typedef struct ZSTD_seekable_s ZSTD_seekable; | |||
typedef struct ZSTD_seekTable_s ZSTD_seekTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note :
the name ZSTD_seekTable
feels correct,
however I can't help but find it a bit too close to ZSTD_seekable
,
which could become a source of confusion.
I don't have a better name in mind though.
@@ -154,21 +155,33 @@ ZSTDLIB_API size_t ZSTD_seekable_writeSeekTable(ZSTD_frameLog* fl, ZSTD_outBuffe | |||
ZSTDLIB_API ZSTD_seekable* ZSTD_seekable_create(void); | |||
ZSTDLIB_API size_t ZSTD_seekable_free(ZSTD_seekable* zs); | |||
|
|||
/*===== Independent seek table management =====*/ | |||
ZSTDLIB_API size_t ZSTD_seekable_copySeekTable(ZSTD_seekable* zs, ZSTD_seekTable** out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : naming.
I expect copy
to represent an operation which would copy a content into an already allocated buffer.
But here this is different.
This operation effectively allocates the required memory buffer(s), and then forwards ownership to the caller.
There is also a risk of failure, that the caller must check.
Naming is hard.
It looks to me that this is actually a creation operation,
which also requires a specific input object to proceed.
In order to respect the established naming convention of the zstd
project, I would recommend the following prototype :
ZSTD_seekTable* ZSTD_seekTable_create_fromSeekable(const ZSTD_seekable* zs);
Note that the change of signature is non trivial :
- The function returns a pointer, instead of an error code
create
is pretty clear in signifying that it allocates an object, that the caller must then manage.- like any allocation operation, it may fail, and the failure will be signaled by a
NULL
return value, which is a commonC
idiom - the argument is
const
, meaning that its content is guaranteed to remain intact during the operation. This is a very strong guarantee, which scales greatly in multi-threaded environments, such an opportunity should never be missed. - the naming scheme leaves room for other forms of creators, such as a potential
_fromInput()
for example
/*===== Seekable decompression functions =====*/ | ||
ZSTDLIB_API size_t ZSTD_seekable_initBuff(ZSTD_seekable* zs, const void* src, size_t srcSize); | ||
ZSTDLIB_API size_t ZSTD_seekable_initFile(ZSTD_seekable* zs, FILE* src); | ||
ZSTDLIB_API size_t ZSTD_seekable_decompress(ZSTD_seekable* zs, void* dst, size_t dstSize, unsigned long long offset); | ||
ZSTDLIB_API size_t ZSTD_seekable_decompressFrame(ZSTD_seekable* zs, void* dst, size_t dstSize, unsigned frameIndex); | ||
|
||
#define ZSTD_SEEKABLE_FRAMEINDEX_TOOLARGE (0ULL-2) | ||
/*===== Seek Table access functions =====*/ | ||
/*===== Seekable seek table access functions =====*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only realizes that now,
but it seems that the const
statement is incorrectly positioned.
ZSTD_seekable* const zs
just means that the pointer zs
will not be modified, which says nothing about its content, and provides absolutely no value at prototype level.
In contrast, const ZSTD_seekable* zs
guarantees that the content of zs
will not be modified, which is very important.
All functions below seem to be read-only. So it would make sense that they all accept a const ZSTD_seekable* zs
argument. I presume this was simply a mistake, that should be fixed.
It impacts your work. Your new API is essentially identical, just applied to ZSTD_seekTable*
object. Therefore, it inherited the same issue in all the new prototypes.
I believe they should also be updated to accept a const ZSTD_seekTable* st
argument.
The strategy followed by this patch is good. Nevertheless, your patch uncovered a few flaws of existing design, that should better be fixed in this PR (see code comments). It is also preferable that the new API capability be delivered with a test set. This new test set can either be a modification of an existing one (maybe in |
@mdittmer , is it still a PR you would like to progress ? |
I'll merge this PR into its own feature branch. I believe there are a nb of points (listed above) that need to be taken care of before merging into mainline. |
Memory constrained use cases that manage multiple archives benefit from
retaining multiple archive seek tables without retaining a
ZSTD_seekable
instance for each.
ZSTD_seekTable
.ZSTD_seekable_copySeekTable()
supports copying seek table out of aZSTD_seekable
.ZSTD_seekTable_[eachSeekTableOp]()
defines seek table API that mirrorsexisting seek table operations.
ZSTD_seekable_[eachSeekTableOp]()
retained; they delegate toZSTD_seekTable
the variant.These changes allow the above-mentioned use cases to initialize a
ZSTD_seekable
, extract itsZSTD_seekTable
, then throw theZSTD_seekable
away to save memory. Standard ZSTD operations can then be used to
decompress frames based on seek table offsets.
The copy and delegate patterns are intended to minimize impact on
existing code and clients. Using copy instead of move for the infrequent
operation extracting a seek table ensures that the extraction does not
render the
ZSTD_seekable
useless. Delegating to new seektable-oriented APIs ensures that this is not a breaking change for
existing clients while supporting all meaningful operations that depend
only on seek table data.