Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing bounds checks when serializing commands/frames. #1030

Merged
merged 2 commits into from
Mar 23, 2021

Conversation

stebet
Copy link
Contributor

@stebet stebet commented Mar 16, 2021

Proposed Changes

Simplified version of my previous PR. This is the biggest change. It removes bounds-checking on spans by using Unsafe.Write methods in the NetworkOrderSerializer classes.

Also split up WireFormatting into different files to make it easier to work with (Shared, Read, Write).

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

Created extensions methods for most cases to make it easier to work with byte offsets rather than having Span.Slice calls all over the place. We can allow ourselves to do this since we can guarantee that we have the require amount of bytes to read/write the frames since we request a certain amount of bytes when serializing and read the required amount of bytes and check frame-end markers when deserializing.

Most span.slice calls were replace with ref span.GetOffset calls references to the spans themselves are replaced with ref span.GetStart() calls.

This utilizes ref parameters heavily and results in a hefty speed increase in several cases. A handful of cases turn out slower but this still results in an overall gain in performance and there is more room for improvement with simplifying read/write methods for frames as I made an example of in BasicAck and more that can be done in WireFormatting.

.NET Framework 4.8 improvements:

Slower diff/base Base Median (ns) Diff Median (ns) Modality
RabbitMQ.Benchmarks.DataTypeLongStringSerialization.LongstrReadPopulated 1.10 326.49 359.30
RabbitMQ.Benchmarks.PrimitivesBool.BoolWrite2(param1: True, param2: False) 1.09 9.97 10.87
RabbitMQ.Benchmarks.DataTypeFieldSerialization.NullWrite 1.04 18.04 18.74
RabbitMQ.Benchmarks.DataTypeTableSerialization.TableGetSizePopulated 1.03 260.91 268.68
RabbitMQ.Benchmarks.MethodChannelClose.ChannelCloseRead 1.02 27.19 27.77
Faster base/diff Base Median (ns) Diff Median (ns) Modality
RabbitMQ.Benchmarks.MethodBasicAck.BasicAckWrite 1.63 21.22 13.05
RabbitMQ.Benchmarks.DataTypeArraySerialization.ArrayWriteEmpty 1.51 21.55 14.22
RabbitMQ.Benchmarks.DataTypeShortStringSerialization.ShortstrWriteEmpty 1.41 18.47 13.13
RabbitMQ.Benchmarks.MethodBasicAck.BasicAckRead 1.25 22.14 17.69
RabbitMQ.Benchmarks.MethodBasicProperties.BasicPropertiesWrite 1.25 78.22 62.60
RabbitMQ.Benchmarks.DataTypeTableSerialization.TableWritePopulated 1.25 655.88 525.35
RabbitMQ.Benchmarks.MethodBasicDeliver.BasicPublishWrite 1.22 83.87 68.81
RabbitMQ.Benchmarks.MethodChannelClose.ChannelCloseWrite 1.21 33.67 27.76
RabbitMQ.Benchmarks.DataTypeLongStringSerialization.LongstrGetSizeEmpty 1.20 1.97 1.65
RabbitMQ.Benchmarks.DataTypeArraySerialization.ArrayWritePopulated 1.17 146.53 125.56
RabbitMQ.Benchmarks.DataTypeFieldSerialization.DictGetSize 1.12 15.49 13.86
RabbitMQ.Benchmarks.MethodBasicDeliver.BasicPublishMemoryWrite 1.11 46.75 41.95
RabbitMQ.Benchmarks.MethodFramingChannelClose.ChannelCloseWrite 1.10 70.35 63.76
RabbitMQ.Benchmarks.MethodFramingBasicPublish.BasicPublishMemoryWrite 1.09 133.65 122.17
RabbitMQ.Benchmarks.MethodFramingBasicPublish.BasicPublishWrite 1.09 200.85 183.66
RabbitMQ.Benchmarks.DataTypeFieldSerialization.DictWrite 1.08 44.15 40.98
RabbitMQ.Benchmarks.DataTypeTableSerialization.TableGetSizeEmpty 1.06 8.83 8.37
RabbitMQ.Benchmarks.DataTypeShortStringSerialization.ShortstrWritePopulated 1.06 145.20 137.53
RabbitMQ.Benchmarks.PrimitivesLong.LongWrite(value: 12345) 1.05 11.42 10.83
RabbitMQ.Benchmarks.PrimitivesShort.ShortWrite(value: 12345) 1.04 11.56 11.07
RabbitMQ.Benchmarks.PrimitivesTimestamp.TimestampWrite 1.04 12.25 11.75
RabbitMQ.Benchmarks.DataTypeFieldSerialization.BinaryTableValueWrite 1.04 43.31 41.59
RabbitMQ.Benchmarks.DataTypeFieldSerialization.IntWrite 1.03 22.53 21.83
RabbitMQ.Benchmarks.PrimitivesLonglong.LonglongWrite(value: 12345) 1.03 11.96 11.60
RabbitMQ.Benchmarks.DataTypeLongStringSerialization.LongstrWriteEmpty 1.03 12.35 12.02
RabbitMQ.Benchmarks.PrimitivesDecimal.DecimalWrite 1.03 13.04 12.69
RabbitMQ.Benchmarks.DataTypeArraySerialization.ArrayGetSizePopulated 1.03 43.98 42.85
RabbitMQ.Benchmarks.DataTypeTableSerialization.TableWriteEmpty 1.03 22.55 21.98
RabbitMQ.Benchmarks.DataTypeTableSerialization.TableReadPopulated 1.02 881.56 865.89

.NET Core 3.1 improvements:

Slower diff/base Base Median (ns) Diff Median (ns) Modality
RabbitMQ.Benchmarks.DataTypeArraySerialization.ArrayGetSizePopulated 1.04 32.29 33.51
RabbitMQ.Benchmarks.DataTypeArraySerialization.ArrayReadPopulated 1.02 79.94 81.90
Faster base/diff Base Median (ns) Diff Median (ns) Modality
RabbitMQ.Benchmarks.DataTypeLongStringSerialization.LongstrWriteEmpty 4.55 8.04 1.77
RabbitMQ.Benchmarks.MethodBasicAck.BasicAckWrite 1.71 3.29 1.92
RabbitMQ.Benchmarks.MethodBasicAck.BasicAckRead 1.51 4.60 3.05
RabbitMQ.Benchmarks.DataTypeFieldSerialization.NullWrite 1.37 4.19 3.06
RabbitMQ.Benchmarks.MethodBasicDeliver.BasicPublishMemoryWrite 1.36 12.46 9.13
RabbitMQ.Benchmarks.DataTypeFieldSerialization.ArrayWrite 1.33 10.29 7.72
RabbitMQ.Benchmarks.MethodChannelClose.ChannelCloseWrite 1.22 7.09 5.81
RabbitMQ.Benchmarks.DataTypeFieldSerialization.IntWrite 1.15 5.16 4.49
RabbitMQ.Benchmarks.MethodBasicDeliver.BasicPublishWrite 1.12 32.82 29.22
RabbitMQ.Benchmarks.DataTypeShortStringSerialization.ShortstrWriteEmpty 1.12 3.90 3.49
RabbitMQ.Benchmarks.DataTypeFieldSerialization.BinaryTableValueWrite 1.08 19.38 17.90
RabbitMQ.Benchmarks.DataTypeFieldSerialization.DictWrite 1.07 20.33 19.00
RabbitMQ.Benchmarks.MethodFramingChannelClose.ChannelCloseWrite 1.07 22.30 20.88
RabbitMQ.Benchmarks.DataTypeArraySerialization.ArrayWritePopulated 1.06 68.35 64.27
RabbitMQ.Benchmarks.DataTypeTableSerialization.TableWriteEmpty 1.06 11.40 10.76
RabbitMQ.Benchmarks.DataTypeTableSerialization.TableWritePopulated 1.06 328.78 310.60
RabbitMQ.Benchmarks.MethodBasicProperties.BasicPropertiesWrite 1.05 35.93 34.09
RabbitMQ.Benchmarks.MethodFramingBasicPublish.BasicPublishWrite 1.05 96.07 91.71
RabbitMQ.Benchmarks.MethodFramingBasicPublish.BasicPublishMemoryWrite 1.05 53.24 50.85
RabbitMQ.Benchmarks.DataTypeFieldSerialization.DictRead 1.05 20.38 19.47
RabbitMQ.Benchmarks.MethodFramingBasicAck.BasicAckWrite 1.05 18.39 17.58
RabbitMQ.Benchmarks.DataTypeFieldSerialization.BinaryTableValueRead 1.04 23.88 22.88
RabbitMQ.Benchmarks.DataTypeLongStringSerialization.LongstrWritePopulated 1.02 228.19 223.47

@stebet
Copy link
Contributor Author

stebet commented Mar 16, 2021

Might want to take a look at this @bollhals :)

@bollhals
Copy link
Contributor

Thanks! I will.

Short question:

We can allow ourselves to do this since we can guarantee that we have the require amount of bytes to read/write the frames since we request a certain amount of bytes when serializing and read the required amount of bytes and check frame-end markers when deserializing.

While I agree on the writing part (client -> server), I'm not sold on the reading part (server -> client).
If the frame contains erroneous data, we will end up in trouble even if we check the frame end.
E.g.
Frame size and end marker are correct, but the short string size is written as 100 but only contains 10 bytes afterwards. => BOOM

I do fancy performance, but for the receiving side this sounds risky.


[Benchmark]
public int ShortstrWriteEmpty() => WireFormatting.WriteShortstr(_buffer.Span, string.Empty);
public int ShortstrWriteEmpty() => WireFormatting.WriteShortstr(ref _buffer.Span.GetStart(), string.Empty, _buffer.Length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should change all of the fields to be arguments to prevent inlining benefits. This more accurately measures how it is used then in the actual methods.

(My assumption is here that due to the inlining it sees that the parameter is string.Empty and therefore throwing away most of the WriteShortStr method. Where as e.g. in the BasicPublish.WriteArgumentsTo() method you access the field, which was set from the ctor. So even if the value is string.Empty, it can not throw away parts of the method)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least this should be done for all constants. In newer .net (5.0+) even static readonly can be considered as invariant and enable better code than it should be for this performance test

@stebet
Copy link
Contributor Author

stebet commented Mar 16, 2021

Thanks! I will.

Short question:

We can allow ourselves to do this since we can guarantee that we have the require amount of bytes to read/write the frames since we request a certain amount of bytes when serializing and read the required amount of bytes and check frame-end markers when deserializing.

While I agree on the writing part (client -> server), I'm not sold on the reading part (server -> client).
If the frame contains erroneous data, we will end up in trouble even if we check the frame end.
E.g.
Frame size and end marker are correct, but the short string size is written as 100 but only contains 10 bytes afterwards. => BOOM

I do fancy performance, but for the receiving side this sounds risky.

Currently it doesn't fail, but it might read garbage data (checks I've done have all bytes return 0, haven't been able to see anything else come up).

I can add simple checks when reading to make sure we have enough remaining bytes when trying to read frames in the case where we might receive malformed frames from the server. It should still be faster than all the slicing.

@stebet
Copy link
Contributor Author

stebet commented Mar 16, 2021

After doing some thinking, doing all the checks on the Reads is pretty complicated so I'll change this PR to apply to just writes. That'll simplify the PR a bit and limit it to safer code. I'll revisit the reads later.

@stebet
Copy link
Contributor Author

stebet commented Mar 18, 2021

@bollhals @michaelklishin Reverted the Read* methods. Benchmarks have been rerun and initial comment updated.

Copy link
Contributor

@bollhals bollhals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it is looking good, left a few remarks.

Would you mind sharing the improvements of a simple method? (Disassembly file)

projects/RabbitMQ.Client/client/framing/BasicAck.cs Outdated Show resolved Hide resolved
projects/RabbitMQ.Client/client/framing/BasicAck.cs Outdated Show resolved Hide resolved
projects/RabbitMQ.Client/client/impl/Frame.cs Outdated Show resolved Hide resolved
projects/RabbitMQ.Client/client/impl/MethodBase.cs Outdated Show resolved Hide resolved
projects/RabbitMQ.Client/client/impl/MethodBase.cs Outdated Show resolved Hide resolved
@stebet
Copy link
Contributor Author

stebet commented Mar 19, 2021

Generally it is looking good, left a few remarks.

Would you mind sharing the improvements of a simple method? (Disassembly file)

Will do.

@stebet
Copy link
Contributor Author

stebet commented Mar 19, 2021

Example method improvements from BasicAck. Note that the new way is much shorter in ASM instructions as well as being branchless (no jumps, which helps the CPU pipelining).

;BasicAck.WriteArgumentsToBefore(System.Span`1<Byte>)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: push ebx
    L0006: lea eax, [ebp+8]
    L0009: mov edx, [eax]
    L000b: mov eax, [eax+4]
    L000e: lea esi, [ecx+4]
    L0011: mov edi, [esi]
    L0013: mov esi, [esi+4]
    L0016: bswap edi
    L0018: xor ebx, ebx
    L001a: bswap esi
    L001c: add esi, 0
    L001f: adc edi, 0
    L0022: cmp eax, 8
    L0025: jb short L0062
    L0027: mov [edx], esi
    L0029: mov [edx+4], edi
    L002c: cmp dword ptr [ebp+0xc], 8
    L0030: jb short L006d
    L0032: mov eax, [ebp+8]
    L0035: mov edx, [ebp+0xc]
    L0038: sub edx, 8
    L003b: add eax, 8
    L003e: movzx ecx, byte ptr [ecx+0xc]
    L0042: cmp edx, 0
    L0045: jbe short L0073
    L0047: test ecx, ecx
    L0049: jne short L004f
    L004b: xor ecx, ecx
    L004d: jmp short L0054
    L004f: mov ecx, 1
    L0054: mov [eax], cl
    L0056: mov eax, 9
    L005b: pop ebx
    L005c: pop esi
    L005d: pop edi
    L005e: pop ebp
    L005f: ret 8
    L0062: mov ecx, 0x28
    L0067: call System.ThrowHelper.ThrowArgumentOutOfRangeException(System.ExceptionArgument)
    L006c: int3
    L006d: call System.ThrowHelper.ThrowArgumentOutOfRangeException()
    L0072: int3
    L0073: call 0x068175d0
    L0078: int3

;BasicAck.WriteArgumentsToAfter(System.Span`1<Byte>)
    L0000: push edi
    L0001: push esi
    L0002: lea eax, [esp+0xc]
    L0006: mov eax, [eax]
    L0008: lea edx, [ecx+4]
    L000b: mov esi, [edx]
    L000d: mov edx, [edx+4]
    L0010: bswap esi
    L0012: xor edi, edi
    L0014: bswap edx
    L0016: add edx, 0
    L0019: adc esi, 0
    L001c: mov [eax], edx
    L001e: mov [eax+4], esi
    L0021: lea eax, [esp+0xc]
    L0025: mov eax, [eax]
    L0027: movzx edx, byte ptr [ecx+0xc]
    L002b: mov [eax+8], dl
    L002e: mov eax, 9
    L0033: pop esi
    L0034: pop edi
    L0035: ret 8

Splitting WireFormatting into separate files for clarity.
@stebet stebet force-pushed the boundscheckremoval branch from eb097d2 to 846e590 Compare March 19, 2021 09:07
Hardcoding redundant buffer-length checks since we precalculate the required buffer sizes before writing.
@michaelklishin michaelklishin merged commit e00b710 into rabbitmq:master Mar 23, 2021
@michaelklishin
Copy link
Member

Thank you!

@stebet stebet changed the title Removing bounds checks when (de)serializing commands/frames. Removing bounds checks when serializing commands/frames. Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants