Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599

Merged
merged 3 commits into from
Sep 9, 2022
Merged

Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599

merged 3 commits into from
Sep 9, 2022

Conversation

uweigand
Copy link
Contributor

Properly byte-swap float/double/decimal types.
Handle array types correctly on big-endian systems.
Added test case for array-of-decimal types.

Fixes #74494

CC @StephenMolloy @Daniel-Svensson @jkotas

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 25, 2022
buffer[offset + 6] = bytes[5];
buffer[offset + 7] = bytes[6];
buffer[offset + 8] = bytes[7];
buffer[offset + 0] = (byte)value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code started writing the value at offset + 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed now.

buffer[offset + 6] = bytes[5];
buffer[offset + 7] = bytes[6];
buffer[offset + 8] = bytes[7];
buffer[offset + 0] = (byte)value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just use BinaryPrimitives.WriteDoubleLittleEndian

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that's much simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, there are more places that can use BinaryPrimitives, for example WriteInt64 method.

Copy link
Contributor Author

@uweigand uweigand Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to make that change as part of this PR? I was hoping that this could get into net7 to fix CI there, so I was trying to keep the changes as small as possible.

Copy link
Contributor

@Daniel-Svensson Daniel-Svensson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments after a quick read through

@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset)
=> (ulong)GetInt64(offset);

public float GetSingle(int offset)
=> ReadRawBytes<float>(offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: You could also have done
BitConverter.Int32BitsToSingle(GetInt32(offset)),

@@ -1437,7 +1442,7 @@ public override unsafe void WriteArray(string? prefix, string localName, string?

public override unsafe void WriteArray(string? prefix, XmlDictionaryString localName, XmlDictionaryString? namespaceUri, decimal[] array, int offset, int count)
{
if (Signing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the code it is very difficult to known if it produces the same result or if the format for arrays have changed.
You should ensure there are tests to cover this similar to those is #71478 , but it might make sense to check more types than I did.

My concern is that currently it only check round-trip support and I think it is important to ensure that the produced file will be readable on LittleEndian and that it can read content produced by little endian machins.

Suggestion:
If more code is needed then maybe you want to consider writing a genereic WriteArray helper.

My own idea (which assumed that it would have been ok to update the writer PR or that it was merged first) to solve the problem was to

  • use a genereic ReadArray/WriteArray helper with a loop for reading/writing elements on big endian.
  • generic "ReverseEndianness" method which need to handle at least short, int, long (other primitives, except deicmal, can always be cast to these using Unsafe or MemoryMarshal.Cast)
    • (Read/WriteRawBytes might have been able to do raw byte swapping as well)

Copy link
Contributor Author

@uweigand uweigand Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! You're right that the array cases were not correct. I thought we could just fall back to the base class implementation like in the reader, but in the writer the base class actually doesn't create any array records, just a series of "regular" records. This is probably not wrong, but it clearly would be preferable to create the same output on big-endian as on little-endian.

Normally, I agree we should just merge your rewrite first, and then fix the endian problems on top. However, I'm not sure the rewrite will still be accepted into net7 at this stage, and I'd really like to see the endian bugs fixed in net7, so I'd prefer to have the endian fix go first, and the performance rewrite on top of that.

The updated implementation I just pushed replaces the generic UnsafeWriteArray helper with per-type array helpers along the line of the existing WriteDateTimeArray etc. These new helpers are still unsafe (for now) since I left the little-endian implementation unchanged, but with your patch on top it should be straightforward to make them all safe.

As to the ReverseEndianness suggestion, I feel it should be cleaner to eliminate explicit byte-swap operations in favor of type-specific fixed-byte-order operations from BinaryPrimitives - see also the comment by @jkotas above.

}

Copy link
Contributor

@Daniel-Svensson Daniel-Svensson Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to ensure there are tests that the produced binary format is correct.
I dont know if and if so when #71478 will be merged so the tests added there cannot be relied upon to find any potential issues,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've added new tests from your PR #71478 and extended them by making sure all data types are covered.

@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset)
=> (ulong)GetInt64(offset);

public float GetSingle(int offset)
=> ReadRawBytes<float>(offset);
=> BinaryPrimitives.ReadSingleLittleEndian(_buffer.AsSpan(offset, 4));

public double GetDouble(int offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also GetSingle/GetDouble methods without the "int" parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had missed those. Now added.

@uweigand
Copy link
Contributor Author

uweigand commented Sep 7, 2022

Ping? Any comments on the new approach?


Span<byte> span = buffer.AsSpan(offset, sizeof(decimal));
BinaryPrimitives.WriteInt32LittleEndian(span, bits[3]);
BinaryPrimitives.WriteInt32LittleEndian(span.Slice(4), bits[2]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is correct. A decimal value is a 96 bit integer stored in little endian order with a final 32 bits holding some flags (stores a ^10 exponent divider and a sign bit). The order to be written to the span should be:
[low 32 bits] [mid 32 bits] [high 32 bits] [flags]
This looks like it writes out:
[flags] [high 32 bits] [low 32 bits] [mid 32 bits]
While each of the 32bit numbers should be written in little endian so need their byte orders switched as you've done, the TryGetBits method doesn't write each of the components out to the passed in span in an endian specific way. The order of ints in the span should remain the same.
Or am I missing something here?

Copy link
Member

@StephenMolloy StephenMolloy Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal is a struct with two Int32 fields and one Int64 field. The declared order is flags, high32, low64.

So when the little-endian implementation iterates over 4 integer pointers here, it writes out flags, high32, bottom-of-low64, top-of-low64 if I'm not mistaken, because the "low" is stored as Int64 and that's the order it would come out on a little-endian machine? Personally, I think it would make more sense to follow the order of TryGetBits if this was a greenfield implementation... but the goal here is to match on the wire what gets written by the little-endian implementation.

Or that's how I read it. Maybe I'm missing something? This is why I asked for extra eyes. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with @StephenMolloy - we need to match the existing wire format, which matches the in-memory format of the decimal type on a little-endian machine, which is "[flags] [high 32] [low 32] [mid 32]".

fixed (int* items = &array[offset])
{
base.UnsafeWriteBytes((byte*)items, 4 * count);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to consider doing a similar approach for little endian as we do for bin endian as UnsafeWriteBytes forces a buffer flush whereas the big endian implementation only causes a buffer flush when the buffer doesn't have space.
We could also improve the performance of the big endian implementation by having a faster path when the size of the array is smaller than the buffer length. In that case you can call GetBuffer once and then have the loop wrap the write call, then call Advance once at the end. Could also do this for bigger arrays and just do it in batches which are smaller than the buffer size. The buffer length is hard coded to 512 bytes so these could be done in batches of 128 count.
None of that is required for this PR, just noting improvements that I can see which are possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered that. But for a very late-stage PR in 7.0 that we can hopefully bring back to 6.0... I thought it's probably better to stay faithful to the existing little-endian implementation. As you say, it's not required for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We can do performance improvements for big-endian systems for .NET 8. I'd be happy to work on this.


Span<byte> span = GetBuffer(16, out int bufferOffset).AsSpan(bufferOffset, 16);
BinaryPrimitives.WriteInt32LittleEndian(span, bits[3]);
BinaryPrimitives.WriteInt32LittleEndian(span.Slice(4), bits[2]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ordering issue as WriteDecimalText

@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset)
=> (ulong)GetInt64(offset);

public float GetSingle(int offset)
=> ReadRawBytes<float>(offset);
=> BinaryPrimitives.ReadSingleLittleEndian(_buffer.AsSpan(offset, 4));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we really should be using sizeof everywhere that we are passing how many bytes we want. For example, here it should be sizeof(float). This seems to have crept in to this code base all over so not requiring it to be fixed, but it would be nice if we fixed this everywhere in these classes.

DateTime datetime = new DateTime(2022, 8, 26, 12, 34, 56, DateTimeKind.Utc);
Span<byte> datetimeBytes = stackalloc byte[8];
BinaryPrimitives.WriteInt64LittleEndian(datetimeBytes, datetime.ToBinary());
AssertReadContentFromBinary(datetime, XmlBinaryNodeType.DateTimeText, datetimeBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To properly test this, you really should test using an inline byte array like the other tests. All you are testing here is that WriteInt64LittleEndian and ReadInt64LittleEndian round trip data.

@uweigand
Copy link
Contributor Author

uweigand commented Sep 9, 2022

I've pushed @StephenMolloy's PR feedback commit from https://github.com/uweigand/runtime/pull/1 to this PR branch to trigger the test suite. Those changes all look good to me. I've successfully re-run the tests with those changes locally on s390x as well.

@akoeplinger
Copy link
Member

@uweigand FYI I've disabled the tests in #75282 to get the runtime-community pipeline green again, please merge main/rebase this PR to get that change and remove the ActiveIssue to reenable the tests.

uweigand and others added 3 commits September 9, 2022 14:40
Properly byte-swap float/double/decimal types.
Handle array types correctly on big-endian systems.
Added more test cases based on PR 71478.

Fixes #74494
@uweigand
Copy link
Contributor Author

uweigand commented Sep 9, 2022

@uweigand FYI I've disabled the tests in #75282 to get the runtime-community pipeline green again, please merge main/rebase this PR to get that change and remove the ActiveIssue to reenable the tests.

Done, thanks!

@StephenMolloy StephenMolloy merged commit c6c7d3c into dotnet:main Sep 9, 2022
@StephenMolloy
Copy link
Member

/backport to release/7.0

@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2022

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3024604883

@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2022

@StephenMolloy backporting to release/7.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems
Applying: PR feedback
Applying: Re-enable binary XML tests on s390x
error: sha1 information is lacking or useless (src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 Re-enable binary XML tests on s390x
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@uweigand
Copy link
Contributor Author

uweigand commented Sep 9, 2022

Hi @akoeplinger it looks like the 7.0 backport is failing because your #75282 isn't in 7.0. Were you planning on backporting this as well? I think it would be great to have the CI green in 7.0 as well.

StephenMolloy added a commit to StephenMolloy/runtime that referenced this pull request Sep 9, 2022
@akoeplinger
Copy link
Member

hmm I wasn't planning on backporting it, but I guess it can't hurt

@StephenMolloy
Copy link
Member

@akoeplinger - I'm manually backporting this PR, so I don't think your change needs to be backported.

@akoeplinger
Copy link
Member

the armv6 and ppc changes are still good to have :)

carlossanlop pushed a commit that referenced this pull request Sep 12, 2022
* Manually backporting #74599 to 7.0 for RC2.

* Fix a couple mis-copied lines of code and a couple nits.
StephenMolloy added a commit to StephenMolloy/runtime that referenced this pull request Sep 14, 2022
* Manually backporting dotnet#74599 to 7.0 for RC2.

* Fix a couple mis-copied lines of code and a couple nits.
@ghost ghost locked as resolved and limited conversation to collaborators Oct 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Serialization community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary Xml format broken on big-endian machines
6 participants