Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

refactor(experimental): add offsetCodec helper to @solana/codecs-core #2294

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

lorisleiva
Copy link
Contributor

@lorisleiva lorisleiva commented Mar 12, 2024

This PR adds a new offsetCodec helper function. See copy/pasted documentation below:


Offsetting codecs

The offsetCodec function is a powerful codec primitive that allows you to move the offset of a given codec forward or backwards. It accepts one or two functions that takes the current offset and returns a new offset.

To understand how this works, let's take our previous biggerU32Codec example which encodes a u32 number inside an 8-byte buffer.

const biggerU32Codec = resizeCodec(getU32Codec(), size => size + 4);
biggerU32Codec.encode(0xffffffff);
// 0xffffffff00000000
//   |       └-- Empty buffer space caused by the resizeCodec function.
//   └-- Our encoded u32 number.

Now, let's say we want to move the offset of that codec 2 bytes forward so that the encoded number sits in the middle of the buffer. To achieve, this we can use the offsetCodec helper and provide a preOffset function that moves the "pre-offset" of the codec 2 bytes forward.

const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
    preOffset: ({ preOffset }) => preOffset + 2,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000
//       └-- Our encoded u32 number is now in the middle of the buffer.

We refer to this offset as the "pre-offset" because, once the inner codec is encoded or decoded, an additional offset will be returned which we refer to as the "post-offset". That "post-offset" is important as, unless we are reaching the end of our codec, it will be used by any further codecs to continue encoding or decoding data.

By default, that "post-offset" is simply the addition of the "pre-offset" and the size of the encoded or decoded inner data.

const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
    preOffset: ({ preOffset }) => preOffset + 2,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000
//   |   |       └-- Post-offset.
//   |   └-- New pre-offset: The original pre-offset + 2.
//   └-- Pre-offset: The original pre-offset before we adjusted it.

However, you may also provide a postOffset function to adjust the "post-offset". For instance, let's push the "post-offset" 2 bytes forward as well such that any further codecs will start doing their job at the end of our 8-byte u32 number.

const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
    preOffset: ({ preOffset }) => preOffset + 2,
    postOffset: ({ postOffset }) => postOffset + 2,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000
//   |   |       |   └-- New post-offset: The original post-offset + 2.
//   |   |       └-- Post-offset: The original post-offset before we adjusted it.
//   |   └-- New pre-offset: The original pre-offset + 2.
//   └-- Pre-offset: The original pre-offset before we adjusted it.

Both the preOffset and postOffset functions offer the following attributes:

  • bytes: The entire byte array being encoded or decoded.
  • preOffset: The original and unaltered pre-offset.
  • wrapBytes: A helper function that wraps the given offset around the byte array length. E.g. wrapBytes(-1) will refer to the last byte of the byte array.

Additionally, the post-offset function also provides the following attributes:

  • newPreOffset: The new pre-offset after the pre-offset function has been applied.
  • postOffset: The original and unaltered post-offset.

Note that you may also decide to ignore these attributes to achieve absolute offsets. However, relative offsets are usually recommended as they won't break your codecs when composed with other codecs.

const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
    preOffset: () => 2,
    postOffset: () => 8,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000

Also note that any negative offset or offset that exceeds the size of the byte array will throw a SolanaError of code SOLANA_ERROR__CODECS__OFFSET_OUT_OF_RANGE.

const u32InTheEndCodec = offsetCodec(biggerU32Codec, { preOffset: () => -4 });
u32InTheEndCodec.encode(0xffffffff);
// throws new SolanaError(SOLANA_ERROR__CODECS__OFFSET_OUT_OF_RANGE)

To avoid this, you may use the wrapBytes function to wrap the offset around the byte array length. For instance, here's how we can use the wrapBytes function to move the pre-offset 4 bytes from the end of the byte array.

const u32InTheEndCodec = offsetCodec(biggerU32Codec, {
    preOffset: ({ wrapBytes }) => wrapBytes(-4),
});
u32InTheEndCodec.encode(0xffffffff);
// 0x00000000ffffffff

As you can see, the offsetCodec helper allows you to jump all over the place with your codecs. This non-linear approach to encoding and decoding data allows you to achieve complex serialization strategies that would otherwise be impossible.

As usual, the offsetEncoder and offsetDecoder functions can also be used to split your codec logic into tree-shakeable functions.

const getU32InTheMiddleEncoder = () => offsetEncoder(biggerU32Encoder, { preOffset: ({ preOffset }) => preOffset + 2 });
const getU32InTheMiddleDecoder = () => offsetDecoder(biggerU32Decoder, { preOffset: ({ preOffset }) => preOffset + 2 });
const getU32InTheMiddleCodec = () => combineCodec(getU32InTheMiddleEncoder(), getU32InTheMiddleDecoder());

Copy link

changeset-bot bot commented Mar 12, 2024

⚠️ No Changeset found

Latest commit: be1301f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@lithdew
Copy link

lithdew commented Mar 12, 2024

My initial thoughts are that it might be good to make the wrapping behavior of the pre/post offset optional, such that the codec will throw an error if the offset specified is greater or less than the size of the input.

To support this, instead of having the preOffsetFunction and postOffsetFunction specified as the 2nd and 3rd function parameter, a configuration object may be passed instead.

Otherwise, everything else looks good.

This also gave me an idea of having a conditional codec, where a child codec provided would only be invoked if some conditions are true of the input, or of previously-decoded/encoded bytes.

@lithdew
Copy link

lithdew commented Mar 12, 2024

One other comment is, what do you think about calling this codec seekCodec rather than offsetCodec?

@lorisleiva
Copy link
Contributor Author

@lithdew Thanks for the feedback! Tbh I went back and forth multiple times on whether to throw or modulo byte overflows.

The reason I went for the later is that it enables negative offsets so using () => -4 will take you 4 bytes before the end of the buffer which can be super helpful.

An option could be nice though. So the API would look something like:

const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
    preOffset: ({ preOffset }) => preOffset + 2,
    postOffset: ({ postOffset }) => postOffset + 2,
    wrapBytes: true, // false by default?
});

The only annoyance is it makes the simple () => -4 example more verbose but maybe that's fine.

// Before.
offsetCodec(innerCodec, () => -4);

// After.
offsetCodec(innerCodec, {
    preOffset: () => -4,
    wrapBytes: true,
});

A conditional codec could be fun to implement as well yeah. I guess its logic would be fairly close to the getOptionCodec function.

I'm not sure I like seekCodec tbh because, for me, it sounds more like findCodec which doesn't really describe what's happening.

Copy link
Contributor

@steveluscher steveluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added via Giphy

One day I'll convince you to write one expectation per test. And to proclaim your undying love for Jest. OK maybe just the first.

mockCodec.write.mockImplementation((_value, _bytes, offset) => offset + innerCodecSize);
mockCodec.read.mockImplementation((bytes, offset) => [bytes, offset + innerCodecSize]);

const codec = codecFactory(mockCodec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlining all of these test cases (as separate, hand-hewn artisanal its()) would reduce the debugging burden on the person who broke them. As it is, they're going to have to jump in and out of this code to figure out what the factory is, and what the expected values are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks of the feedback. I've updated all the tests. Lmk what you think.

packages/codecs-core/src/__tests__/offset-codec-test.ts Outdated Show resolved Hide resolved
@lorisleiva
Copy link
Contributor Author

lorisleiva commented Mar 12, 2024

@steveluscher Following the conversation on this PR and talking to @joncinque about it, I'm gonna refactor the offsetCodec API slightly so it accepts a config object (see code in my previous message).

Instead of the wrapBytes boolean though I'm thinking of passing a wrapBytes function within the scope of both preOffset and postOffset functions.

So the API would look like this. What do you all think?

offsetCodec(innerCoder, {
    preOffset: ({ wrapBytes }) => wrapBytes(-4),
    postOffset: ({ preOffset }) => preOffset,
});

@lorisleiva lorisleiva force-pushed the loris/offset-codec branch 4 times, most recently from acb9554 to bf99665 Compare March 13, 2024 11:29
Comment on lines +4 to +9
rules: {
'jest/expect-expect': [
'error',
{ assertFunctionNames: ['expect', 'expectNewPreOffset', 'expectNewPostOffset'] },
],
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveluscher Is this the best place to put this?

Copy link
Contributor

@buffalojoec buffalojoec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! How does this bad boy work inside of an ArrayDecoder? Can I use an OffsetDecoder for each element in my array? In other words:

createArrayDecoder(
  createOffsetDecoder(/* .. */)
)

If so, not sure if you want to add some test coverage on arrays.

packages/codecs-core/src/offset-codec.ts Show resolved Hide resolved
@lorisleiva
Copy link
Contributor Author

@buffalojoec Yeah offsets will apply to each array items which can create some crazy use-cases haha. I'll add some array tests in sub-sequent PRs.

Copy link
Contributor Author

lorisleiva commented Mar 14, 2024

Merge activity

  • Mar 14, 6:08 AM EDT: @lorisleiva started a stack merge that includes this pull request via Graphite.
  • Mar 14, 6:09 AM EDT: Graphite rebased this pull request as part of a merge.
  • Mar 14, 6:10 AM EDT: @lorisleiva merged this pull request with Graphite.

Base automatically changed from loris/resize-codec to master March 14, 2024 10:08
@lorisleiva lorisleiva merged commit 09d8cc8 into master Mar 14, 2024
7 checks passed
@lorisleiva lorisleiva deleted the loris/offset-codec branch March 14, 2024 10:10
lorisleiva added a commit that referenced this pull request Mar 18, 2024
This PR adds a couple of tests for offsetted arrays as [suggested here](#2294 (review)).
Copy link
Contributor

🎉 This PR is included in version 1.91.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link
Contributor

Because there has been no activity on this PR for 14 days since it was merged, it has been automatically locked. Please open a new issue if it requires a follow up.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants