Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[contrib] Support seek table-only API #2113

Merged
merged 1 commit into from
Mar 2, 2021

Conversation

mdittmer
Copy link
Contributor

@mdittmer mdittmer commented May 7, 2020

Memory constrained use cases that manage multiple archives benefit from
retaining multiple archive seek tables without retaining a ZSTD_seekable
instance for each.

  • New opaque type for seek table: ZSTD_seekTable.
  • ZSTD_seekable_copySeekTable() supports copying seek table out of a
    ZSTD_seekable.
  • ZSTD_seekTable_[eachSeekTableOp]() defines seek table API that mirrors
    existing seek table operations.
  • Existing ZSTD_seekable_[eachSeekTableOp]() retained; they delegate to
    ZSTD_seekTable the variant.

These changes allow the above-mentioned use cases to initialize a
ZSTD_seekable, extract its ZSTD_seekTable, then throw the ZSTD_seekable
away to save memory. Standard ZSTD operations can then be used to
decompress frames based on seek table offsets.

The copy and delegate patterns are intended to minimize impact on
existing code and clients. Using copy instead of move for the infrequent
operation extracting a seek table ensures that the extraction does not
render the ZSTD_seekable useless. Delegating to new seek
table-oriented APIs ensures that this is not a breaking change for
existing clients while supporting all meaningful operations that depend
only on seek table data.

Memory constrained use cases that manage multiple archives benefit from
retaining multiple archive seek tables without retaining a ZSTD_seekable
instance for each.

* New opaque type for seek table: ZSTD_seekTable.
* ZSTD_seekable_copySeekTable() supports copying seek table out of a
  ZSTD_seekable.
* ZSTD_seekTable_[eachSeekTableOp]() defines seek table API that mirrors
  existing seek table operations.
* Existing ZSTD_seekable_[eachSeekTableOp]() retained; they delegate to
  ZSTD_seekTable the variant.

These changes allow the above-mentioned use cases to initialize a
ZSTD_seekable, extract its ZSTD_seekTable, then throw the ZSTD_seekable
away to save memory. Standard ZSTD operations can then be used to
decompress frames based on seek table offsets.

The copy and delegate patterns are intended to minimize impact on
existing code and clients. Using copy instead of move for the infrequent
operation extracting a seek table ensures that the extraction does not
render the ZSTD_seekable useless. Delegating to *new* seek
table-oriented APIs ensures that this is not a breaking change for
existing clients while supporting all meaningful operations that depend
only on seek table data.
@facebook-github-bot
Copy link

Hi @mdittmer!

Thank you for your pull request and welcome to our community.We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@mdittmer
Copy link
Contributor Author

mdittmer commented May 7, 2020

CC @Cyan4973

@Cyan4973 Cyan4973 self-assigned this May 7, 2020
size_t entriesSize = sizeof(seekEntry_t) * (zs->seekTable.tableLen + 1);
seekEntry_t* entries = (seekEntry_t*)malloc(entriesSize);
if (!entries) {
free(entries);
Copy link
Contributor

@Cyan4973 Cyan4973 May 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you meant free(st); ?
Otherwise, there is leak on following return.

{
ZSTD_seekTable* st = malloc(sizeof(ZSTD_seekTable));
if (!st) {
free(st);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor :
if st == NULL, there is no need to free() it.

@@ -29,6 +29,7 @@ extern "C" {

typedef struct ZSTD_seekable_CStream_s ZSTD_seekable_CStream;
typedef struct ZSTD_seekable_s ZSTD_seekable;
typedef struct ZSTD_seekTable_s ZSTD_seekTable;
Copy link
Contributor

@Cyan4973 Cyan4973 May 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note :
the name ZSTD_seekTable feels correct,
however I can't help but find it a bit too close to ZSTD_seekable,
which could become a source of confusion.

I don't have a better name in mind though.

@@ -154,21 +155,33 @@ ZSTDLIB_API size_t ZSTD_seekable_writeSeekTable(ZSTD_frameLog* fl, ZSTD_outBuffe
ZSTDLIB_API ZSTD_seekable* ZSTD_seekable_create(void);
ZSTDLIB_API size_t ZSTD_seekable_free(ZSTD_seekable* zs);

/*===== Independent seek table management =====*/
ZSTDLIB_API size_t ZSTD_seekable_copySeekTable(ZSTD_seekable* zs, ZSTD_seekTable** out);
Copy link
Contributor

@Cyan4973 Cyan4973 May 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : naming.

I expect copy to represent an operation which would copy a content into an already allocated buffer.
But here this is different.
This operation effectively allocates the required memory buffer(s), and then forwards ownership to the caller.
There is also a risk of failure, that the caller must check.

Naming is hard.
It looks to me that this is actually a creation operation,
which also requires a specific input object to proceed.
In order to respect the established naming convention of the zstd project, I would recommend the following prototype :

ZSTD_seekTable* ZSTD_seekTable_create_fromSeekable(const ZSTD_seekable* zs);

Note that the change of signature is non trivial :

  • The function returns a pointer, instead of an error code
  • create is pretty clear in signifying that it allocates an object, that the caller must then manage.
  • like any allocation operation, it may fail, and the failure will be signaled by a NULL return value, which is a common C idiom
  • the argument is const, meaning that its content is guaranteed to remain intact during the operation. This is a very strong guarantee, which scales greatly in multi-threaded environments, such an opportunity should never be missed.
  • the naming scheme leaves room for other forms of creators, such as a potential _fromInput() for example

/*===== Seekable decompression functions =====*/
ZSTDLIB_API size_t ZSTD_seekable_initBuff(ZSTD_seekable* zs, const void* src, size_t srcSize);
ZSTDLIB_API size_t ZSTD_seekable_initFile(ZSTD_seekable* zs, FILE* src);
ZSTDLIB_API size_t ZSTD_seekable_decompress(ZSTD_seekable* zs, void* dst, size_t dstSize, unsigned long long offset);
ZSTDLIB_API size_t ZSTD_seekable_decompressFrame(ZSTD_seekable* zs, void* dst, size_t dstSize, unsigned frameIndex);

#define ZSTD_SEEKABLE_FRAMEINDEX_TOOLARGE (0ULL-2)
/*===== Seek Table access functions =====*/
/*===== Seekable seek table access functions =====*/
Copy link
Contributor

@Cyan4973 Cyan4973 May 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only realizes that now,
but it seems that the const statement is incorrectly positioned.

ZSTD_seekable* const zs just means that the pointer zs will not be modified, which says nothing about its content, and provides absolutely no value at prototype level.
In contrast, const ZSTD_seekable* zs guarantees that the content of zs will not be modified, which is very important.

All functions below seem to be read-only. So it would make sense that they all accept a const ZSTD_seekable* zs argument. I presume this was simply a mistake, that should be fixed.

It impacts your work. Your new API is essentially identical, just applied to ZSTD_seekTable* object. Therefore, it inherited the same issue in all the new prototypes.
I believe they should also be updated to accept a const ZSTD_seekTable* st argument.

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 9, 2020

The strategy followed by this patch is good.
Essentially, seekTable_t already existed, though only as an internal object.
You expose it as a ZSTD_seekTable public incomplete type, and make all the relevant functions public, as opposed to private. This minimizes implementation risks.

Nevertheless, your patch uncovered a few flaws of existing design, that should better be fixed in this PR (see code comments).

It is also preferable that the new API capability be delivered with a test set. This new test set can either be a modification of an existing one (maybe in examples/seekable_decompress.c ?), or be an entirely new test created for the new prototypes. The choice is entirely up to you.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 9, 2020

@mdittmer , is it still a PR you would like to progress ?

@Cyan4973
Copy link
Contributor

Cyan4973 commented Mar 2, 2021

I'll merge this PR into its own feature branch.

I believe there are a nb of points (listed above) that need to be taken care of before merging into mainline.
I'll likely take over the work from the feature branch.

@Cyan4973 Cyan4973 changed the base branch from dev to seekTable March 2, 2021 18:50
@Cyan4973 Cyan4973 merged commit ce6d1b9 into facebook:seekTable Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants