-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BinaryFormatter PayloadReader API #102014
Comments
Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones |
|
cc @JeremyKuhne |
We should probably not support them. While you can do this, it is hard to imagine seeing this in the wild outside of malicious input. Complexity in the reader for this isn't worth it, imho.
I don't see how that is possible. Cycles are normal in BF streams. Nested classes are one simple example of this. |
There is an API for determining if a payload is BF for |
I've ported WinForms to the new API (dotnet/winforms#11341), here are the public API changes it required: public static class PayloadReader
{
+ public static SerializationRecord Read(Stream payload, out IReadOnlyDictionary<int, SerializationRecord> recordMap, PayloadOptions? options = default, bool leaveOpen = false);
}
public abstract class SerializationRecord
{
+ public virtual int ObjectId { get; }
}
public enum RecordType : byte
{
- SystemClassWithMembers,
- ClassWithMembers,
}
public abstract class ArrayRecord : SerializationRecord
{
+ public abstract TypeName ElementTypeName { get; }
} |
Why is AssemblyNameInfo separate property? I would expect that you can get it from |
From the usages I've seen nobody needed this information and I am not sure if any of our serialization APIs expose this information. For seekable streams the user can do the math themselves: long before = stream.Position;
var record = PayloadReader.Read(stream, leaveOpen: true);
long consumed = stream.Position - before; Handling the non-seekable stream correctly on our side would require a lot of complexity and I am against (the simpler it is the less likely we are introducing new vulnerabilities)
This abstraction belongs to ASP.NET and this API needs to be very low level and introduce low number of dependencies all supporting old monikers (so existing Full Framework app can use it too to move away from BF)
And that is why the
The string is prefixed with the length, encoded as an integer seven bits at a time. So you are right. |
Great point, I've learned it the hard way. I've removed both these values from the public enum.
I've already implemented the support along with the tests. IMO the best we can do right now is:
But reading them and exposing does not require the reader itself to use recursion (mostly because |
In most of the usages I've seen the user has a byte array, creates
It is technically possible, but would require way more work on our side (two different implementations and testing all of that). I would prefer to wait for user feedback after releasing the initial version in Preview before implementing that. |
Great catch! It's caused by internal BF implementation details. You are right that it should be one property. I'll try to implement that and get back with findings. |
What is the policy that this API applies for comparing assembly names? Does it compare simple name only (it is what I would expect)? It may be worth mentioning this in the doc comment.
Nit: I think this should say: "It takes |
Done (adamsitnik/SafePayloadReader@261e842), I am going to update the proposal. |
I'm concerned that the names here are all too broad.
Compare with System.Text.Json:
I don't know that |
The problem starts with the assembly and namespace names. "BinaryFormat" is very generic term. For example, check the Wikipedia page for binary format: https://en.wikipedia.org/wiki/Binary_format . The current PR with internal implementation uses NRBF in some places and BinaryFormat in other places. Should we unify on NRBF as the one name and use it consistently everywhere?
The name with these two observations taken into account can be something like
|
These names sound good to me! |
Maybe make sense to release it as nuget package and don't include it in BCL? I think this is rare use case and make sense have this legacy thing outside of BCL. (+ you can support older frameworks to make easier migration path) |
Yes, that's the plan. |
namespace System.Formats.Nrbf;
public static class NrbfDecoder
{
public static bool StartsWithPayloadHeader(byte[] bytes);
public static bool StartsWithPayloadHeader(Stream stream);
public static SerializationRecord Decode(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);
public static SerializationRecord Decode(Stream payload, out IReadOnlyDictionary<SerializationRecordId, SerializationRecord> recordMap, PayloadOptions? options = default, bool leaveOpen = false);
public static ClassRecord DecodeClassRecord(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);
}
public readonly struct SerializationRecordId : IEquatable<SerializationRecordId>
{
}
public sealed class PayloadOptions
{
public PayloadOptions();
public TypeNameParseOptions? TypeNameParseOptions { get; set; }
public bool UndoTruncatedTypeNames { get; set; }
}
public abstract class SerializationRecord
{
internal SerializationRecord();
public abstract TypeName TypeName { get; }
public abstract RecordType RecordType { get; }
public abstract SerializationRecordId Id { get; }
public bool TypeNameMatches(Type type);
}
public enum SerializationRecordType
{
SerializedStreamHeader,
ClassWithId,
SystemClassWithMembersAndTypes = 4,
ClassWithMembersAndTypes,
BinaryObjectString,
BinaryArray,
MemberPrimitiveTyped,
MemberReference,
ObjectNull,
MessageEnd,
BinaryLibrary,
ObjectNullMultiple256,
ObjectNullMultiple,
ArraySinglePrimitive,
ArraySingleObject,
ArraySingleString
}
public abstract class PrimitiveTypeRecord<T> : SerializationRecord
{
private protected PrimitiveTypeRecord(T value);
public T Value { get; }
}
public abstract class ClassRecord : SerializationRecord
{
private protected ClassRecord(ClassInfo classInfo);
public TypeName TypeName { get; }
public IEnumerable<string> MemberNames { get; }
public bool HasMember(string memberName);
public string? GetString(string memberName);
public bool GetBoolean(string memberName);
public byte GetByte(string memberName);
public sbyte GetSByte(string memberName);
public short GetInt16(string memberName);
public ushort GetUInt16(string memberName);
public char GetChar(string memberName);
public int GetInt32(string memberName);
public uint GetUInt32(string memberName);
public float GetSingle(string memberName);
public long GetInt64(string memberName);
public ulong GetUInt64(string memberName);
public double GetDouble(string memberName);
public decimal GetDecimal(string memberName);
public TimeSpan GetTimeSpan(string memberName);
public DateTime GetDateTime(string memberName);
public ArrayRecord? GetArrayRecord(string memberName);
public SerializationRecord? GetSerializationRecord(string memberName);
public ClassRecord? GetClassRecord(string memberName);
public object? GetRawValue(string memberName);
}
public abstract class ArrayRecord : SerializationRecord
{
private protected ArrayRecord(ArrayInfo arrayInfo);
public abstract ReadOnlySpan<int> Lengths { get; }
public int Rank { get; }
public BinaryArrayType ArrayType { get; }
public abstract TypeName ElementTypeName { get; }
public Array GetArray(Type expectedArrayType, bool allowNulls = true);
}
public enum BinaryArrayType : byte
{
Single = 0,
Jagged = 1,
Rectangular = 2,
SingleOffset = 3,
JaggedOffset = 4,
RectangularOffset = 5
}
public abstract class SZArrayRecord<T> : ArrayRecord
{
private protected SZArrayRecord(ArrayInfo arrayInfo);
public int Length { get; }
public abstract T?[] GetArray(bool allowNulls = true);
} |
Remember also what I mentioned on the stream: the current implementation of A little more info: Generally speaking, we say that if the attacker takes O(n) effort to induce the recipient into performing greater than O(n ln n) work, this has the hallmark of a DoS vulnerability. Serialization formats that contain backreferences are particularly susceptible to this, since backreferences essentially allow an attacker to perform arbitrarily efficient compression. NRBF's class records have implicit backreferences in that they reference library records which occur earlier in the payload. So the attack is --
On the server --
So the attacker performs O(m + n) work, but they coerce the server into performing O(m * n) work. This is a legitimate DoS vulnerability. (FWIW this is why System.Text.Json does not allow parsing backreferences by default. It could lead to similar vulnerabilities in the consuming app logic.) |
Background and Motivation
BinaryFormatter
is getting removed in .NET 9, but our customers need to be able to read the payloads that:Our primary goal is to allow the users to read BF payloads in a secure manner from untrusted input. The principles:
We also want to make the APIs easy to use, to avoid the customers using the OOB package with the copy of
BinaryFormatter
(and remaining vulnerable to various attacks). That is why currently the public API surface is very narrow. We could expose more information, but we don't want to confuse the users or need them to become familiar with BF specification to get simple tasks done. Example:null
can be represented using three different serialization records (ObjectNull
,ObjectNullMultiple
andObjectNullMultiple256
). The public APIs just returnnull
, rather than a record that represents it.The new APIs need to be shipped in a new OOB package that supports older monikers, as we have first party customers running on Full Framework that are going to use it.
Proposed API
Usage Examples
The implementation with no dependency to dotnet/runtime can be found here.
Reading a class serialized with BF to a file
Checking if Stream contains BF payload
The users need to be able to check if given
Stream
containsBF
data, as they might want to migrate the data on demand to new serialization format:SzArrays
Single dimension, zero-indexed arrays are expected to be the most frequently used arrays.
Other arrays
BF supports:
They are all represented by internal types that derive from
ArrayRecord
. The users can use the API to instantiate such arrays, but they need to provide the expected array type. By doing that we make this advanced scenario possible and safe (the library is not loading any types, if there is a type mismatch it throws).For more usages of this API please refer to JaggedArraysTests.cs, RectangularArraysTests.cs and CustomOffsetArrays.cs.
Arrays of non-primitive types
Arrays of non-primitive types are represented as
ArrayRecord<ClassRecord>
or justArrayRecord
.Risks
If the new APIs are not easy to use, some of the users might choose the new OOB package with a copy of BF and remain vulnerable to all attacks. This defeats the purpose of our initiative and must be avoided.
The text was updated successfully, but these errors were encountered: