Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization Rewrite #189

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from
Draft

Conversation

NycroV
Copy link
Collaborator

@NycroV NycroV commented Oct 10, 2024

This PR aims to improve upon the serialization of LavalinkTrack objects.

While the original system worked fine and did not have any explicit issues, there were a number of limitations imposed by forcing encoding through the use of ToString() and Parse(). As mentioned by @SKProCH in #166, a certain amount of overhead is imposed by requiring the encoding of "raw bytes => Utf8/Base64 => Utf16", and even more overhead when the user may want to store or utilize track serialization in formats apart from Utf16.

The new system implements two core methods - the static LavalinkTrack.Deserialize(), and the instanced LavalinkTrack.Serialize(). These methods work directly to serialize/deserialize byte arrays, to allow users to manipulate these arrays in more ways with less overhead. This also comes with a slight performance benefit as a result of the change to some underlying code structure.

LavalinkTrack still implements IFormattable and IParseable to standardize serialization directly to strings, in situations where that is more convenient. No code changes should be necessary for end users, and any tracks stored to databases should remain compatible with the new system.

In addition to the above overhead concerns, the switch to the new system provides the following benefits:

  • Easier to expand
  • Standardized (uses natively integrated C# BinaryReader and BinaryWriter)
  • Requires less extensions - creates a more compact codebase
  • Optimized (no longer requires repeated re-allocation of buffers and multiple encode attempts)
  • Simpler, cleaner code formatting

@NycroV NycroV added enhancement New feature or request help wanted Extra attention is needed labels Oct 10, 2024
@NycroV NycroV self-assigned this Oct 10, 2024
@NycroV NycroV linked an issue Oct 10, 2024 that may be closed by this pull request
@NycroV
Copy link
Collaborator Author

NycroV commented Oct 10, 2024

Marked this PR as a draft as it is not ready for merge yet (as indicated by the failing build status).

I have been unable to thoroughly debug this code myself as a result of the recent hurricane hitting my city, and me not having a consistent access to WiFi. Thankfully there are some well written tests that are providing useful debug information while I try to find a way to debug this.

I've added the help wanted tag in the event someone might be willing to write a test or two for the new serialization code. While not necessary, they would definitely prove useful!

@NycroV
Copy link
Collaborator Author

NycroV commented Oct 13, 2024

Requesting a quick check from @angelobreuer for this last test:
TestEncodeKnownTrackWithSpecialCharacters contains encoded track data in its comparison to verify that special characters are properly serialized. However, the new encoding format changes the way this data is encoded.

There are two options for resolving this:

  • Change the encoded data for the test to be that of a track encoded with the new format
  • Change the logic to essentially a second version of the round-trip encoding, but containing special characters in the name

I would personally lean towards the latter, as using round trip encoding here means that no matter how the encoding is formatted, it still performs the function of "verify that the track data is the same before/after encoding even when the track has special characters". However, I can also see the benefits of embedding some raw encoded data into the test.

Let me know which option you'd prefer (or if you'd prefer something entirely different) and then I can get this PR stress tested and merged. All other tests are now passing!

@angelobreuer
Copy link
Owner

Requesting a quick check from @angelobreuer for this last test: TestEncodeKnownTrackWithSpecialCharacters contains encoded track data in its comparison to verify that special characters are properly serialized. However, the new encoding format changes the way this data is encoded.

There are two options for resolving this:

  • Change the encoded data for the test to be that of a track encoded with the new format
  • Change the logic to essentially a second version of the round-trip encoding, but containing special characters in the name

I would personally lean towards the latter, as using round trip encoding here means that no matter how the encoding is formatted, it still performs the function of "verify that the track data is the same before/after encoding even when the track has special characters". However, I can also see the benefits of embedding some raw encoded data into the test.

Let me know which option you'd prefer (or if you'd prefer something entirely different) and then I can get this PR stress tested and merged. All other tests are now passing!

Sorry for the delay of the review. I just had a look at the changes.

The serialization and deserialization is indeed quite slow and weirdly structured but it is intentional since we need to be compatible with track identifiers from Lavalink. The Lavalink server also uses an own format to store and most importantly restore tracks from the encoded payload.

Without changes to Lavalink itself we can not update those - but it makes sense to actually have an improved format of track identifiers but we need to make sure every time we send them to Lavalink to use a version or format that can be interpreted by the Lavalink server. Some users make heavy use of the track identifiers and store them in database or similar, we just need to make sure then that once we get a new version track identifier we can encode and decode them in the format required for Lavalink

The test you mentioned (TestEncodeKnownTrackWithSpecialCharacters) contained a special UTF-8 sequence that was reported to crash Lavalink or Lavalink4NET (do not remember what it crashed) and the test was made to verify there are no regressions between the required format for Lavalink (written in Java and thus using Java Internals for formatting) and the format implemented in Lavalink4NET

@NycroV
Copy link
Collaborator Author

NycroV commented Oct 19, 2024

Ah, I was unaware Lavalink had their own encoding format that needed to be adhered to. That would make sense. 😅

Does the V3 encoding format adhere to the Lavalink standards? If so, I should be able to pretty easily change the new code to produce an identical output, but still utilize the same binary writing structure by creating a custom writer that uses big endian format instead of little endian.

This would inherit all the benefits of the new code structure, but maintain the product format of the old system. I could also create some sort of intermediate conversion if that ends up being the better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow serializing/desterilizing LavalinkTrack directly to byte array
2 participants