Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BinaryFormatter PayloadReader API #102014

Closed
adamsitnik opened this issue May 8, 2024 · 20 comments · Fixed by #103232
Closed

BinaryFormatter PayloadReader API #102014

adamsitnik opened this issue May 8, 2024 · 20 comments · Fixed by #103232
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Formats.Nrbf binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it blocking Marks issues that we want to fast track in order to unblock other important work
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented May 8, 2024

Background and Motivation

BinaryFormatter is getting removed in .NET 9, but our customers need to be able to read the payloads that:

  • were serialized with BF (using previous .NET versions) and persisted (to disk/db etc)
  • are being generated by software they have no control over (example: 3rd party clients calling an existing web service with public API that allows for BF).

Our primary goal is to allow the users to read BF payloads in a secure manner from untrusted input. The principles:

  • Treating every input as potentially hostile.
  • No type loading of any kind (to avoid remote code execution).
  • No recursion of any kind (to avoid unbound recursion, stack overflow and denial of service).
  • No buffer pre-allocation based on size provided in payload (to avoid running out of memory and denial of service).
  • Using collision-resistant dictionary to store records referenced by other records.
  • Only primitive types can be instantiated in implicit way. Arrays can be instantiated on demand (with a default max size limit). Other types are never instantiated.

We also want to make the APIs easy to use, to avoid the customers using the OOB package with the copy of BinaryFormatter (and remaining vulnerable to various attacks). That is why currently the public API surface is very narrow. We could expose more information, but we don't want to confuse the users or need them to become familiar with BF specification to get simple tasks done. Example: null can be represented using three different serialization records (ObjectNull, ObjectNullMultiple and ObjectNullMultiple256). The public APIs just return null, rather than a record that represents it.

The new APIs need to be shipped in a new OOB package that supports older monikers, as we have first party customers running on Full Framework that are going to use it.

Proposed API

namespace System.Runtime.Serialization.BinaryFormat;

public static class NrbfReader
{
    /// <summary>
    /// Checks if given buffer starts with <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/a7e578d3-400a-4249-9424-7529d10d1b3c">NRBF payload header</see>.
    /// </summary>
    /// <param name="bytes">The buffer to inspect.</param>
    /// <returns><see langword="true" /> if it starts with NRBF payload header; otherwise, <see langword="false" />.</returns>
    public static bool StartsWithPayloadHeader(byte[] bytes);

    /// <summary>
    /// Checks if given stream starts with <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/a7e578d3-400a-4249-9424-7529d10d1b3c">NRBF payload header</see>.
    /// </summary>
    /// <param name="stream">The stream to inspect. The stream must be both readable and seekable.</param>
    /// <returns><see langword="true" /> if it starts with NRBF payload header; otherwise, <see langword="false" />.</returns>
    /// <exception cref="ArgumentNullException"><paramref name="stream" /> is <see langword="null" />.</exception>
    /// <exception cref="NotSupportedException">The stream does not support reading or seeking.</exception>
    /// <exception cref="ObjectDisposedException">The stream was closed.</exception>
    /// <remarks><para>When this method returns, <paramref name="stream" /> will be restored to its original position.</para></remarks>
    public static bool StartsWithPayloadHeader(Stream stream);

    /// <summary>
    /// Reads the provided NRBF payload.
    /// </summary>
    /// <param name="payload">The NRBF payload.</param>
    /// <param name="options">Options to control behavior during parsing.</param>
    /// <param name="leaveOpen">
    ///   <see langword="true" /> to leave <paramref name="payload"/> payload open
    ///   after the reading is finished; otherwise, <see langword="false" />.
    /// </param>
    /// <returns>A <see cref="SerializationRecord"/> that represents the root object.
    /// It can be either <see cref="PrimitiveTypeRecord{T}"/>,
    /// a <see cref="ClassRecord"/> or an <see cref="ArrayRecord"/>.</returns>
    /// <exception cref="ArgumentNullException"><paramref name="payload"/> is <see langword="null" />.</exception>
    /// <exception cref="ArgumentException"><paramref name="payload"/> does not support reading or is already closed.</exception>
    /// <exception cref="SerializationException">Reading from <paramref name="payload"/> encounters invalid NRBF data.</exception>
    /// <exception cref="DecoderFallbackException">Reading from <paramref name="payload"/>
    /// encounters an invalid UTF8 sequence.</exception>
    public static SerializationRecord Read(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);

    /// <param name="recordMap">
    ///   When this method returns, contains a mapping of <see cref="SerializationRecord.ObjectId" /> to the associated serialization record.
    ///   This parameter is treated as uninitialized.
    /// </param>
    public static SerializationRecord Read(Stream payload, out IReadOnlyDictionary<int, SerializationRecord> recordMap, PayloadOptions? options = default, bool leaveOpen = false);

    /// <summary>
    /// Reads the provided Binary Format payload that is expected to contain an instance of any class (or struct) that is not an <seealso cref="Array"/> or a primitive type.
    /// </summary>
    /// <returns>A <seealso cref="ClassRecord"/> that represents the root object.</returns>
    public static ClassRecord ReadClassRecord(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);
}

public sealed class PayloadOptions
{
    public PayloadOptions() { }

    public TypeNameParseOptions? TypeNameParseOptions { get; set; }

    /// <summary>
    /// Gets or sets a value that indicates whether type name truncation is undone.
    /// </summary>
    /// <value><see langword="true" /> if truncated type names should be reassembled; otherwise, <see langword="false" />.</value>
    /// <remarks>
    /// Example:
    /// TypeName: "Namespace.TypeName`1[[Namespace.GenericArgName"
    /// LibraryName: "AssemblyName]]"
    /// Is combined into "Namespace.TypeName`1[[Namespace.GenericArgName, AssemblyName]]"
    /// </remarks>
    public bool UndoTruncatedTypeNames { get; set; }
}

/// <summary>
/// Abstract class that represents the serialization record.
/// </summary>
/// <remarks>
/// Every instance returned to the end user can be either <seealso cref="PrimitiveTypeRecord{T}"/>,
/// a <seealso cref="ClassRecord"/> or an <seealso cref="ArrayRecord"/>.
/// </remarks>
public abstract class SerializationRecord
{
    internal SerializationRecord(); // others can't derive from this type

    /// <summary>
    /// Gets the type of the record.
    /// </summary>
    /// <value>The type of the record.</value>
    public abstract RecordType RecordType { get; }

    /// <summary>
    /// Gets the ID of the record.
    /// </summary>
    /// <value>The ID of the record.</value>
    public abstract int ObjectId { get; }

    /// <summary>
    /// Compares the type and assembly name read from the payload against the specified type.
    /// </summary>
    /// <remarks>
    /// <para>This method takes type forwarding into account.</para>
    /// <para>This method does NOT take into account member names or their types.</para>
    /// </remarks>
    /// <param name="type">The type to compare against.</param>
    /// <returns><see langword="true" /> if the serialized type and assembly name match provided type; otherwise, <see langword="false" />.</returns>
    public virtual bool IsTypeNameMatching(Type type);
}

/// <summary>
/// Record type.
/// </summary>
/// <remarks>
///  <para>
///   The enumeration does not contain all values supported by the <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/954a0657-b901-4813-9398-4ec732fe8b32">
///   [MS-NRBF] 2.1.2.1</see>, but only those supported by the <see cref="PayloadReader"/>.
///  </para>
/// </remarks>
public enum RecordType : byte
{
    SerializedStreamHeader,
    ClassWithId,
    // SystemClassWithMembers and ClassWithMembers are not supported by design (require type loading) and not included
    SystemClassWithMembersAndTypes = 4,
    ClassWithMembersAndTypes,
    BinaryObjectString,
    BinaryArray,
    MemberPrimitiveTyped,
    MemberReference,
    ObjectNull,
    MessageEnd,
    BinaryLibrary,
    ObjectNullMultiple256,
    ObjectNullMultiple,
    ArraySinglePrimitive,
    ArraySingleObject,
    ArraySingleString
}

/// <summary>
/// Represents a record that itself represents the primitive value of <typeparamref name="T"/> type.
/// </summary>
/// <typeparam name="T">The type of the primitive value.</typeparam>
/// <remarks>
/// <para>
/// The NRBF specification considers the following types to be primitive:
/// <see cref="string"/>, <see cref="bool"/>, <see cref="byte"/>, <see cref="sbyte"/>
/// <see cref="char"/>, <see cref="short"/>, <see cref="ushort"/>,
/// <see cref="int"/>, <see cref="uint"/>, <see cref="long"/>, <see cref="ulong"/>,
/// <see cref="float"/>, <see cref="double"/>, <see cref="decimal"/>,
/// <see cref="DateTime"/> and <see cref="TimeSpan"/>.
/// </para>
/// <para>Other serialization records are represented with <see cref="ClassRecord"/> or <see cref="ArrayRecord"/>.</para>
/// </remarks>
public abstract class PrimitiveTypeRecord<T> : SerializationRecord
{
    private protected PrimitiveTypeRecord(T value);

    public T Value { get; }
}

/// <summary>
/// Defines the core behavior for NRBF class records and provides a base for derived classes.
/// </summary>
public abstract class ClassRecord : SerializationRecord
{
    private protected ClassRecord(ClassInfo classInfo);

    public TypeName TypeName { get; }

    public IEnumerable<string> MemberNames { get; }

    /// <summary>
    /// Checks if member of given name was present in the payload.
    /// </summary>
    /// <param name="memberName">The name of the member.</param>
    /// <returns><see langword="true" /> if it was present, otherwise <see langword="false" />.</returns>
    /// <remarks>
    ///  <para>
    ///   It's recommended to use this method when dealing with payload that may contain
    ///   different versions of the same type.
    ///  </para>
    /// </remarks>
    public bool HasMember(string memberName);

    public string? GetString(string memberName);
    public bool GetBoolean(string memberName);
    public byte GetByte(string memberName);
    public sbyte GetSByte(string memberName);
    public short GetInt16(string memberName);
    public ushort GetUInt16(string memberName);
    public char GetChar(string memberName);
    public int GetInt32(string memberName);
    public uint GetUInt32(string memberName);
    public float GetSingle(string memberName);
    public long GetInt64(string memberName); 
    public ulong GetUInt64(string memberName);
    public double GetDouble(string memberName);
    public decimal GetDecimal(string memberName);
    public TimeSpan GetTimeSpan(string memberName);
    public DateTime GetDateTime(string memberName);

    /// <summary>
    /// Retrieves an array for the provided <paramref name="memberName"/>.
    /// </summary>
    /// <param name="memberName">The name of the field.</param>
    /// <param name="allowNulls">Specifies whether null values are allowed.</param>
    /// <returns>The array itself or null.</returns>
    /// <exception cref="KeyNotFoundException">Member of such name does not exist.</exception>
    /// <exception cref="InvalidOperationException">Member of such name has value of a different type.</exception>
    public T?[]? GetArrayOfPrimitiveType<T>(string memberName, bool allowNulls = true);

    /// <summary>
    /// Retrieves the <see cref="SerializationRecord" /> of the provided <paramref name="memberName"/>.
    /// </summary>
    /// <param name="memberName">The name of the field.</param>
    /// <returns>The serialization record, which can be any of <see cref="PrimitiveTypeRecord{T}"/>,
    /// <see cref="ClassRecord"/>, <see cref="ArrayRecord"/> or <see langword="null" />.
    /// </returns>
    /// <exception cref="KeyNotFoundException"><paramref name="memberName" /> does not refer to a known member. You can use <see cref="HasMember(string)"/> to check if given member exists.</exception>
    /// <exception cref="InvalidOperationException">The specified member is not a <see cref="SerializationRecord"/>, but just a raw primitive value.</exception>
    public SerializationRecord? GetSerializationRecord(string memberName);

    /// <summary>
    /// Retrieves the value of the provided <paramref name="memberName"/>.
    /// </summary>
    /// <param name="memberName">The name of the member.</param>
    /// <returns>The value.</returns>
    /// <exception cref="KeyNotFoundException"><paramref name="memberName" /> does not refer to a known member. You can use <see cref="HasMember(string)"/> to check if given member exists.</exception>
    /// <exception cref="InvalidOperationException">Member of such name has value of a different type.</exception>
    public ClassRecord? GetClassRecord(string memberName);

    /// <returns>
    /// <para>For primitive types like <see cref="int"/>, <see langword="string"/> or <see cref="DateTime"/> returns their value.</para>
    /// <para>For nulls, returns a null.</para>
    /// <para>For other types that are not arrays, returns an instance of <see cref="ClassRecord"/>.</para>
    /// <para>For single-dimensional arrays returns <see cref="ArrayRecord{T}"/> where the generic type is the primitive type or <see cref="ClassRecord"/>.</para>
    /// <para>For jagged and multi-dimensional arrays, returns an instance of <see cref="ArrayRecord"/>.</para>
    /// </returns>
    public object? GetRawValue(string memberName);
}

/// <summary>
/// Defines the core behavior for NRBF array records and provides a base for derived classes.
/// </summary>
public abstract class ArrayRecord : SerializationRecord
{
    private protected ArrayRecord(ArrayInfo arrayInfo);

    /// <summary>
    /// When overridden in a derived class, gets a buffer of integers that represent the number of elements in every dimension.
    /// </summary>
    /// <value>A buffer of integers that represent the number of elements in every dimension.</value>
    public abstract ReadOnlySpan<int> Lengths { get; }

    /// <summary>
    /// Gets the rank of the array.
    /// </summary>
    /// <value>The rank of the array.</value>
    public int Rank { get; }

    /// <summary>
    /// Gets the type of the array.
    /// </summary>
    /// <value>The type of the array.</value>
    public BinaryArrayType ArrayType { get; }

    /// <summary>
    /// Gets the name of the array element type.
    /// </summary>
    /// <value>The name of the array element type.</value>
    public abstract TypeName ElementTypeName { get; }

    /// <summary>
    /// Allocates an array and fills it with the data provided in the serialized records (in case of primitive types like <see cref="string"/> or <see cref="int"/>) or the serialized records themselves.
    /// </summary>
    /// <param name="expectedArrayType">Expected array type.</param>
    /// <param name="allowNulls">
    ///   <see langword="true" /> to permit <see langword="null" /> values within the array;
    ///   otherwise, <see langword="false" />.
    /// </param>
    /// <returns>An array filled with the data provided in the serialized records.</returns>
    /// <exception cref="InvalidOperationException"><paramref name="expectedArrayType" /> does not match the data from the payload.</exception>
    public Array GetArray(Type expectedArrayType, bool allowNulls = true);
}

/// <summary>
/// Binary array type.
/// </summary>
/// <remarks>
/// BinaryArrayType enumeration is described in <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/4dbbf3a8-6bc4-4dfc-aa7e-36a35be6ff58">[MS-NRBF] 2.4.1.1</see>.
/// </remarks>
public enum BinaryArrayType : byte
{
    /// <summary>
    ///  A single-dimensional array.
    /// </summary>
    Single = 0,

    /// <summary>
    ///  An array whose elements are arrays. The elements of a jagged array can be of different dimensions and sizes.
    /// </summary>
    Jagged = 1,

    /// <summary>
    ///  A multi-dimensional rectangular array.
    /// </summary>
    Rectangular = 2,

    /// <summary>
    ///  A single-dimensional array where the lower bound index is greater than 0.
    /// </summary>
    SingleOffset = 3,

    /// <summary>
    ///  A jagged array where the lower bound index is greater than 0.
    /// </summary>
    JaggedOffset = 4,

    /// <summary>
    ///  Multi-dimensional arrays where the lower bound index of at least one of the dimensions is greater than 0.
    /// </summary>
    RectangularOffset = 5
}

/// <summary>
/// Defines the core behavior for NRBF single dimensional, zero-indexed array records and provides a base for derived classes.
/// </summary>
public abstract class ArrayRecord<T> : ArrayRecord
{
    private protected ArrayRecord(ArrayInfo arrayInfo);

    /// <summary>
    /// Gets the length of the array.
    /// </summary>
    /// <value>The length of the array.</value>
    public int Length { get; }

    /// <summary>
    /// When overridden in a derived class, allocates an array of <typeparamref name="T"/> and fills it with the data provided in the serialized records (in case of primitive types like <see cref="string"/> or <see cref="int"/>) or the serialized records themselves.
    /// </summary>
    /// <param name="allowNulls">
    ///   <see langword="true" /> to permit <see langword="null" /> values within the array;
    ///   otherwise, <see langword="false" />.
    /// </param>
    /// <returns>An array filled with the data provided in the serialized records.</returns>
    public abstract T?[] GetArray(bool allowNulls = true);
}

Usage Examples

The implementation with no dependency to dotnet/runtime can be found here.

Reading a class serialized with BF to a file

ClassRecord rootRecord = NrbfReader.ReadClassRecord(File.OpenRead("peristedPayload.bf"));
Sample output = new()
{
    // using the dedicated methods to read primitive values
    Integer = rootRecord.GetInt32(nameof(Sample.Integer)),
    Text = rootRecord.GetString(nameof(Sample.Text)),
    // using dedicated method to read an array of bytes
    ArrayOfBytes = rootRecord.GetArrayOfPrimitiveType<byte>(nameof(Sample.ArrayOfBytes)),
    // using GetClassRecord to read a class record
    ClassInstance = new()
    {
        Text = rootRecord
            .GetClassRecord(nameof(Sample.ClassInstance))!
            .GetString(nameof(Sample.Text))
    }  
};

[Serializable]
public class Sample
{
    public int Integer;
    public string? Text;
    public byte[]? ArrayOfBytes;
    public Sample? ClassInstance;
}

Checking if Stream contains BF payload

The users need to be able to check if given Stream contains BF data, as they might want to migrate the data on demand to new serialization format:

static T Pseudocode<T>(Stream payload, NewSerializer newSerializer)
{
    if (NrbfReader.StartsWithPayloadHeader(payload))
    {
        T fromPayload = UseThePayloadReaderToReadTheData<T>(payload);

        payload.Seek(0, SeekOrigin.Begin);
        newSerializer.Serialize(payload, fromPayload);
        payload.Flush();
    }
    else
    {
        return newSerializer.Deserialize<T>(payload)
    }
}

SzArrays

Single dimension, zero-indexed arrays are expected to be the most frequently used arrays.

SerializationRecord rootObject = NrbfReader.Read(File.OpenRead("peristedPayload.bf"));
if (rootObject is ArrayRecord<string> arrayOfStrings)
{
    string?[] strings = arrayOfStrings.GetArray();
}

Other arrays

BF supports:

  • jagged arrays
  • multi-dimensional array
  • non-zero indexed arrays

They are all represented by internal types that derive from ArrayRecord. The users can use the API to instantiate such arrays, but they need to provide the expected array type. By doing that we make this advanced scenario possible and safe (the library is not loading any types, if there is a type mismatch it throws).

public abstract class ArrayRecord : SerializationRecrd
{
    public Array GetArray(Type expectedArrayType, bool allowNulls = true);
}
ArrayRecord arrayRecord = (ArrayRecord)NrbfReader.Read(File.OpenRead("peristedPayload.bf"));
if (arrayRecord.ArrayType == ArrayType.Jagged)
{
    int[][][] array = (int[][][])arrayRecord.GetArray(expectedArrayType: typeof(int[][][]));
}

For more usages of this API please refer to JaggedArraysTests.cs, RectangularArraysTests.cs and CustomOffsetArrays.cs.

Arrays of non-primitive types

Arrays of non-primitive types are represented as ArrayRecord<ClassRecord> or just ArrayRecord.

ArrayRecord<ClassRecord> rootRecord = (ArrayRecord<ClassRecord>)NrbfReader.Read(File.OpenRead("peristedPayload.bf"));
ClassRecord[] classRecords = rootRecord.GetArray(allowNulls: false)!;
Sample[] output = classRecords
    .Select(classRecord => new Sample()
    {
        Integer = classRecord.GetInt32(nameof(Sample.Integer)),
        Text = classRecord.GetString(nameof(Sample.Text))
    })
    .ToArray();

Risks

If the new APIs are not easy to use, some of the users might choose the new OOB package with a copy of BF and remain vulnerable to all attacks. This defeats the purpose of our initiative and must be avoided.

@adamsitnik adamsitnik added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Security labels May 8, 2024
@adamsitnik adamsitnik added this to the 9.0.0 milestone May 8, 2024
@adamsitnik adamsitnik self-assigned this May 8, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

@huoyaoyuan
Copy link
Member

  • Where's bytesConsumed information? Could IPipeReader being more convenient since there's sort of peeking?
  • Arrays containing nulls are effectively sparse, so there's a risk of relay attack. String requires the actual payload to be as long as the string content. Am I correct?

@jkotas
Copy link
Member

jkotas commented May 8, 2024

cc @JeremyKuhne

@JeremyKuhne
Copy link
Member

No type loading of any kind (to avoid remote code execution).

ClassWithMembers and SystemClassWithMembers require type loading to actually parse them. And you can't really parse them without a type name binder- to match what the BinaryFormatter would have done in existing code.

non-zero indexed arrays

We should probably not support them. While you can do this, it is hard to imagine seeing this in the wild outside of malicious input. Complexity in the reader for this isn't worth it, imho.

No recursion of any kind (to avoid unbound recursion, stack overflow and denial of service).

I don't see how that is possible. Cycles are normal in BF streams. Nested classes are one simple example of this.

@vcsjones
Copy link
Member

vcsjones commented May 8, 2024

public static bool ContainsBinaryFormatterPayload(ReadOnlySpan<byte> bytes);

There is an API for determining if a payload is BF for ReadOnlySpan<byte> but Read and ReadClassRecord only accept a stream. Should those latter two APIs have overloads that accept a ReadOnlySpan? If not, what is the purpose of "is this ROS a BF payload" if nothing can be done with the ROS?

@adamsitnik
Copy link
Member Author

adamsitnik commented May 15, 2024

I've ported WinForms to the new API (dotnet/winforms#11341), here are the public API changes it required:

public static class PayloadReader
{
+    public static SerializationRecord Read(Stream payload, out IReadOnlyDictionary<int, SerializationRecord> recordMap, PayloadOptions? options = default, bool leaveOpen = false);
}

public abstract class SerializationRecord
{
+    public virtual int ObjectId { get; }
}

public enum RecordType : byte
{
-    SystemClassWithMembers,
-    ClassWithMembers,
}

public abstract class ArrayRecord : SerializationRecord
{
+    public abstract TypeName ElementTypeName { get; }
}

@jkotas
Copy link
Member

jkotas commented May 15, 2024

ElementTypeLibraryName

Why is AssemblyNameInfo separate property? I would expect that you can get it from ElementTypeName.

@adamsitnik
Copy link
Member Author

Where's bytesConsumed information?

From the usages I've seen nobody needed this information and I am not sure if any of our serialization APIs expose this information. For seekable streams the user can do the math themselves:

long before = stream.Position;
var record = PayloadReader.Read(stream, leaveOpen: true);
long consumed = stream.Position - before;

Handling the non-seekable stream correctly on our side would require a lot of complexity and I am against (the simpler it is the less likely we are introducing new vulnerabilities)

Could IPipeReader being more convenient since there's sort of peeking?

This abstraction belongs to ASP.NET and this API needs to be very low level and introduce low number of dependencies all supporting old monikers (so existing Full Framework app can use it too to move away from BF)

Arrays containing nulls are effectively sparse, so there's a risk of relay attack.

And that is why the ToArray method has default max size and ability to throw on nulls.

String requires the actual payload to be as long as the string content. Am I correct?

The string is prefixed with the length, encoded as an integer seven bits at a time. So you are right.

@adamsitnik
Copy link
Member Author

adamsitnik commented May 15, 2024

ClassWithMembers and SystemClassWithMembers require type loading to actually parse them.

Great point, I've learned it the hard way. I've removed both these values from the public enum.

We should probably not support them.

I've already implemented the support along with the tests. IMO the best we can do right now is:

  1. get the API accepted (I am not exposing the offsets anywhere as of now)
  2. remove the support (and the 3 enum values from public ArrayType)
  3. release it in a Preview
  4. if we receive feedback based on real life scenario, bring the implementation back and consider exposing offsets info

I don't see how that is possible. Cycles are normal in BF streams.

But reading them and exposing does not require the reader itself to use recursion (mostly because MemberReferenceRecord has no ObjectId so it's impossible to create a cycle using references)

@adamsitnik
Copy link
Member Author

There is an API for determining if a payload is BF for ReadOnlySpan<byte> but Read and ReadClassRecord only accept a stream. Should those latter two APIs have overloads that accept a ReadOnlySpan? If not, what is the purpose of "is this ROS a BF payload" if nothing can be done with the ROS?

In most of the usages I've seen the user has a byte array, creates MemoryStream out of it and then performs the work. I've added the ReadOnlySpan<byte> overload so they don't need to allocate MemoryStream just to find out it's not BF.

Should those latter two APIs have overloads that accept a ReadOnlySpan?

It is technically possible, but would require way more work on our side (two different implementations and testing all of that). I would prefer to wait for user feedback after releasing the initial version in Preview before implementing that.

@adamsitnik
Copy link
Member Author

Why is AssemblyNameInfo separate property? I would expect that you can get it from ElementTypeName.

Great catch! It's caused by internal BF implementation details. You are right that it should be one property. I'll try to implement that and get back with findings.

@jkotas
Copy link
Member

jkotas commented May 15, 2024

IsTypeNameMatching

What is the policy that this API applies for comparing assembly names? Does it compare simple name only (it is what I would expect)? It may be worth mentioning this in the doc comment.

It takes type forwarding into account.

Nit: I think this should say: "It takes System.Runtime.CompilerServices.TypeForwardedFromAttribute into account." I would not expect this API to take vanilla type forwarding into account.

@adamsitnik
Copy link
Member Author

I'll try to implement that and get back with findings.

Done (adamsitnik/SafePayloadReader@261e842), I am going to update the proposal.

@adamsitnik adamsitnik added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels May 15, 2024
@bartonjs
Copy link
Member

I'm concerned that the names here are all too broad.

  • PayloadReader, lots of things have payloads.
  • RecordType I don't have data to back me up, but I'd guess this is probably one of the most popular enum names in existence.
    • Ranking up there with the alternative of RecordKind for "don't use Type when someone might think it means System.Type"
  • ArrayType sounds like typeof(System.Array). ArrayKind might be OK, but still feels really broadly named for such a remote namespace.

Compare with System.Text.Json:

  • PayloadReader => Utf8JsonReader
  • RecordType => JsonValueKind

I don't know that NrbfPayloadReader or NrbfReader is a good name (Though I have no issues with NrbfRecordKind... huh.). BinaryFormatReader is a very broad sounding name, but just because we gave a very broad name to a very specific component.

@jkotas
Copy link
Member

jkotas commented May 27, 2024

I'm concerned that the names here are all too broad.

The problem starts with the assembly and namespace names. "BinaryFormat" is very generic term. For example, check the Wikipedia page for binary format: https://en.wikipedia.org/wiki/Binary_format .

The current PR with internal implementation uses NRBF in some places and BinaryFormat in other places. Should we unify on NRBF as the one name and use it consistently everywhere?

MS-NRBF is legacy Microsoft standard and this component is decoupling it from the .NET runtime. Does it really belong to the System or System.Runtime namespaces? Would Microsoft namespace make more sense?

The name with these two observations taken into account can be something like Microsoft.Serialization.Nrbf.

I don't know that NrbfPayloadReader or NrbfReader is a good name

NrbfReader looks good to me. It matches existing low-level reader types like XmlReader, Utf8JsonReader or MetadataReader.

@adamsitnik
Copy link
Member Author

Microsoft.Serialization.Nrbf
NrbfReader

These names sound good to me!

@jkotas jkotas added the binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it label May 29, 2024
@Romfos
Copy link

Romfos commented May 30, 2024

Maybe make sense to release it as nuget package and don't include it in BCL?

I think this is rare use case and make sense have this legacy thing outside of BCL. (+ you can support older frameworks to make easier migration path)

@jkotas
Copy link
Member

jkotas commented May 30, 2024

Maybe make sense to release it as nuget package and don't include it in BCL?

Yes, that's the plan.

@terrajobst
Copy link
Member

terrajobst commented Jun 6, 2024

  • System.Formats.Nrbf seems like a better
  • NrbfReader
    • Rename NrbfReader to NrbfDecoder and ReadXxx to DecodeXxx
    • Consider using SerializationRecordId instead of int to avoid people naively creating their own dictionaries which are then subject to collision attacks
    • Exposing it as IReadOnlyDictionary<SerializationRecordId, SerializationRecord> is fine.
  • PayloadOptions
    • Can we remove UndoTruncatedTypeNames?
  • SerializationRecord
    • ObjectId should be of type SerializationRecordId and be renamed to Id
    • Move TypeName from ClassRecord (for arrays this requires constructing a TypeName but that seems fine)
    • Rename IsTypeNameMatching to TypeNameMatches and wouldn't need to be virtual
  • RecordType
    • Rename to SerializationRecordType
    • Drop the byte constraint
    • It seems we think we should either expose all values in the spec or only expose the ones that the API never exposes. Right now it's a mixture.
  • PrimitiveTypeRecord<T>
    • Consider having a non-generic base type PrimitiveTypeRecord that exposes an object Value property that boxes on demand
    • This allows callers to check whether something is a primitive and even get the value as an object
  • ClassRecord
    • GetArrayOfPrimitiveType replace with GetArrayRecord() to avoid unbounded array creation
  • ArrayRecord
    • Expose a method that returns the total number of elements (handling different array shapes). We should cap it at int and throw an exception if it exceeds int.MaxValue so users don't have to handle math and overflow correctly.
  • ArrayRecord<T>
    • Rename to SZArrayRecord<T>
    • GetArray() should return the same array when called twice
  • BinaryArrayType
    • Do we need to expose it all? It sounds like we could away with a Boolean on ArrayRecord
namespace System.Formats.Nrbf;

public static class NrbfDecoder
{
    public static bool StartsWithPayloadHeader(byte[] bytes);
    public static bool StartsWithPayloadHeader(Stream stream);
    public static SerializationRecord Decode(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);
    public static SerializationRecord Decode(Stream payload, out IReadOnlyDictionary<SerializationRecordId, SerializationRecord> recordMap, PayloadOptions? options = default, bool leaveOpen = false);
    public static ClassRecord DecodeClassRecord(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);
}

public readonly struct SerializationRecordId : IEquatable<SerializationRecordId>
{
}

public sealed class PayloadOptions
{
    public PayloadOptions();
    public TypeNameParseOptions? TypeNameParseOptions { get; set; }
    public bool UndoTruncatedTypeNames { get; set; }
}

public abstract class SerializationRecord
{
    internal SerializationRecord();
    public abstract TypeName TypeName { get; }
    public abstract RecordType RecordType { get; }
    public abstract SerializationRecordId Id { get; }
    public bool TypeNameMatches(Type type);
}

public enum SerializationRecordType
{
    SerializedStreamHeader,
    ClassWithId,
    SystemClassWithMembersAndTypes = 4,
    ClassWithMembersAndTypes,
    BinaryObjectString,
    BinaryArray,
    MemberPrimitiveTyped,
    MemberReference,
    ObjectNull,
    MessageEnd,
    BinaryLibrary,
    ObjectNullMultiple256,
    ObjectNullMultiple,
    ArraySinglePrimitive,
    ArraySingleObject,
    ArraySingleString
}

public abstract class PrimitiveTypeRecord<T> : SerializationRecord
{
    private protected PrimitiveTypeRecord(T value);
    public T Value { get; }
}

public abstract class ClassRecord : SerializationRecord
{
    private protected ClassRecord(ClassInfo classInfo);
    public TypeName TypeName { get; }
    public IEnumerable<string> MemberNames { get; }
    public bool HasMember(string memberName);
    public string? GetString(string memberName);
    public bool GetBoolean(string memberName);
    public byte GetByte(string memberName);
    public sbyte GetSByte(string memberName);
    public short GetInt16(string memberName);
    public ushort GetUInt16(string memberName);
    public char GetChar(string memberName);
    public int GetInt32(string memberName);
    public uint GetUInt32(string memberName);
    public float GetSingle(string memberName);
    public long GetInt64(string memberName); 
    public ulong GetUInt64(string memberName);
    public double GetDouble(string memberName);
    public decimal GetDecimal(string memberName);
    public TimeSpan GetTimeSpan(string memberName);
    public DateTime GetDateTime(string memberName);
    public ArrayRecord? GetArrayRecord(string memberName);
    public SerializationRecord? GetSerializationRecord(string memberName);
    public ClassRecord? GetClassRecord(string memberName);
    public object? GetRawValue(string memberName);
}

public abstract class ArrayRecord : SerializationRecord
{
    private protected ArrayRecord(ArrayInfo arrayInfo);
    public abstract ReadOnlySpan<int> Lengths { get; }
    public int Rank { get; }
    public BinaryArrayType ArrayType { get; }
    public abstract TypeName ElementTypeName { get; }
    public Array GetArray(Type expectedArrayType, bool allowNulls = true);
}

public enum BinaryArrayType : byte
{
    Single = 0,
    Jagged = 1,
    Rectangular = 2,
    SingleOffset = 3,
    JaggedOffset = 4,
    RectangularOffset = 5
}

public abstract class SZArrayRecord<T> : ArrayRecord
{
    private protected SZArrayRecord(ArrayInfo arrayInfo);
    public int Length { get; }
    public abstract T?[] GetArray(bool allowNulls = true);
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 6, 2024
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jun 6, 2024

Remember also what I mentioned on the stream: the current implementation of TypeName.Parse, since it requires both the type name and the assembly name to be pre-concatenated before processing, means that it's very difficult (probably impossible) for NrbfReader to rely on it safely.

A little more info:

Generally speaking, we say that if the attacker takes O(n) effort to induce the recipient into performing greater than O(n ln n) work, this has the hallmark of a DoS vulnerability.

Serialization formats that contain backreferences are particularly susceptible to this, since backreferences essentially allow an attacker to perform arbitrarily efficient compression. NRBF's class records have implicit backreferences in that they reference library records which occur earlier in the payload. So the attack is --

  1. I send a large library record at the beginning of the payload. Let's say it's a 1MB record, so I perform O(1 million) work.
  2. I send a bunch of class records with names "Foo0000", "Foo0001", "Foo0002", etc. All of these contain references to the library I sent in step (1). Let's say I send 1 million such records, which is another O(1 million) work for me.
  3. I've now performed O(1 million) + O(1 million) work, which is still O(1 million).

On the server --

  1. It reads the original library record. That's O(1 million) work. All is well so far.
  2. It reads each subsequent class record. There are 1 million of them, so it's about to enter a 1 million iteration loop. That's still good so far, except...
  3. Each iteration of the loop involves a call to TypeName.Parse, but that API requires scanning a buffer which contains the constructed fully qualified name of the type. The amount of work in each iteration is thus O(typename_len + assemblyname_len), which is O(small + 1 million), which is O(1 million).
  4. The total work performed by the server is thus O(1 million) * O(1 million) = O(1 trillion).

So the attacker performs O(m + n) work, but they coerce the server into performing O(m * n) work. This is a legitimate DoS vulnerability.

(FWIW this is why System.Text.Json does not allow parsing backreferences by default. It could lead to similar vulnerabilities in the consuming app logic.)

@github-actions github-actions bot locked and limited conversation to collaborators Jul 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Formats.Nrbf binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it blocking Marks issues that we want to fast track in order to unblock other important work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants