Serialization Rewrite #189

NycroV · 2024-10-10T02:49:18Z

This PR aims to improve upon the serialization of LavalinkTrack objects.

While the original system worked fine and did not have any explicit issues, there were a number of limitations imposed by forcing encoding through the use of ToString() and Parse(). As mentioned by @SKProCH in #166, a certain amount of overhead is imposed by requiring the encoding of "raw bytes => Utf8/Base64 => Utf16", and even more overhead when the user may want to store or utilize track serialization in formats apart from Utf16.

The new system implements two core methods - the static LavalinkTrack.Deserialize(), and the instanced LavalinkTrack.Serialize(). These methods work directly to serialize/deserialize byte arrays, to allow users to manipulate these arrays in more ways with less overhead. This also comes with a slight performance benefit as a result of the change to some underlying code structure.

LavalinkTrack still implements IFormattable and IParseable to standardize serialization directly to strings, in situations where that is more convenient. No code changes should be necessary for end users, and any tracks stored to databases should remain compatible with the new system.

In addition to the above overhead concerns, the switch to the new system provides the following benefits:

Easier to expand
Standardized (uses natively integrated C# BinaryReader and BinaryWriter)
Requires less extensions - creates a more compact codebase
Optimized (no longer requires repeated re-allocation of buffers and multiple encode attempts)
Simpler, cleaner code formatting

NycroV · 2024-10-10T03:02:55Z

Marked this PR as a draft as it is not ready for merge yet (as indicated by the failing build status).

I have been unable to thoroughly debug this code myself as a result of the recent hurricane hitting my city, and me not having a consistent access to WiFi. Thankfully there are some well written tests that are providing useful debug information while I try to find a way to debug this.

I've added the help wanted tag in the event someone might be willing to write a test or two for the new serialization code. While not necessary, they would definitely prove useful!

NycroV · 2024-10-13T04:25:29Z

Requesting a quick check from @angelobreuer for this last test:
TestEncodeKnownTrackWithSpecialCharacters contains encoded track data in its comparison to verify that special characters are properly serialized. However, the new encoding format changes the way this data is encoded.

There are two options for resolving this:

Change the encoded data for the test to be that of a track encoded with the new format
Change the logic to essentially a second version of the round-trip encoding, but containing special characters in the name

I would personally lean towards the latter, as using round trip encoding here means that no matter how the encoding is formatted, it still performs the function of "verify that the track data is the same before/after encoding even when the track has special characters". However, I can also see the benefits of embedding some raw encoded data into the test.

Let me know which option you'd prefer (or if you'd prefer something entirely different) and then I can get this PR stress tested and merged. All other tests are now passing!

angelobreuer · 2024-10-17T09:21:03Z

Requesting a quick check from @angelobreuer for this last test: TestEncodeKnownTrackWithSpecialCharacters contains encoded track data in its comparison to verify that special characters are properly serialized. However, the new encoding format changes the way this data is encoded.

There are two options for resolving this:

Change the encoded data for the test to be that of a track encoded with the new format

Change the logic to essentially a second version of the round-trip encoding, but containing special characters in the name

I would personally lean towards the latter, as using round trip encoding here means that no matter how the encoding is formatted, it still performs the function of "verify that the track data is the same before/after encoding even when the track has special characters". However, I can also see the benefits of embedding some raw encoded data into the test.

Let me know which option you'd prefer (or if you'd prefer something entirely different) and then I can get this PR stress tested and merged. All other tests are now passing!

Sorry for the delay of the review. I just had a look at the changes.

The serialization and deserialization is indeed quite slow and weirdly structured but it is intentional since we need to be compatible with track identifiers from Lavalink. The Lavalink server also uses an own format to store and most importantly restore tracks from the encoded payload.

Without changes to Lavalink itself we can not update those - but it makes sense to actually have an improved format of track identifiers but we need to make sure every time we send them to Lavalink to use a version or format that can be interpreted by the Lavalink server. Some users make heavy use of the track identifiers and store them in database or similar, we just need to make sure then that once we get a new version track identifier we can encode and decode them in the format required for Lavalink

The test you mentioned (TestEncodeKnownTrackWithSpecialCharacters) contained a special UTF-8 sequence that was reported to crash Lavalink or Lavalink4NET (do not remember what it crashed) and the test was made to verify there are no regressions between the required format for Lavalink (written in Java and thus using Java Internals for formatting) and the format implemented in Lavalink4NET

NycroV · 2024-10-19T06:53:53Z

Ah, I was unaware Lavalink had their own encoding format that needed to be adhered to. That would make sense. 😅

Does the V3 encoding format adhere to the Lavalink standards? If so, I should be able to pretty easily change the new code to produce an identical output, but still utilize the same binary writing structure by creating a custom writer that uses big endian format instead of little endian.

This would inherit all the benefits of the new code structure, but maintain the product format of the old system. I could also create some sort of intermediate conversion if that ends up being the better option.

NycroV added 4 commits October 7, 2024 23:53

Set up new serialization base

412014e

New deserialization

9e60f79

Re-implement legacy serialization

4dd4b70

Documentation

b3e4120

NycroV added enhancement New feature or request help wanted Extra attention is needed labels Oct 10, 2024

NycroV self-assigned this Oct 10, 2024

NycroV linked an issue Oct 10, 2024 that may be closed by this pull request

Allow serializing/desterilizing LavalinkTrack directly to byte array #166

Open

Fix tests*

2f6d97c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization Rewrite #189

Serialization Rewrite #189

NycroV commented Oct 10, 2024

NycroV commented Oct 10, 2024

NycroV commented Oct 13, 2024

angelobreuer commented Oct 17, 2024

NycroV commented Oct 19, 2024

Serialization Rewrite #189

Are you sure you want to change the base?

Serialization Rewrite #189

Conversation

NycroV commented Oct 10, 2024

NycroV commented Oct 10, 2024

NycroV commented Oct 13, 2024

angelobreuer commented Oct 17, 2024

NycroV commented Oct 19, 2024