Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: Add type to avoid boxing .NET intrinsic types #28882

Open
JeremyKuhne opened this issue Mar 6, 2019 · 61 comments
Open

API Proposal: Add type to avoid boxing .NET intrinsic types #28882

JeremyKuhne opened this issue Mar 6, 2019 · 61 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@JeremyKuhne
Copy link
Member

JeremyKuhne commented Mar 6, 2019

Background and Motivation

Currently there is no way to pass around a heterogeneous set of .NET value types without boxing them into objects or creating a custom wrapper struct. To facilitate low allocation exchange of value types we should provide a struct that allows passing the most common value types without boxing and still allows storing within other types (including arrays) or on the heap when needed.

ASP.NET and Azure SDK have both expressed a need for this functionality for scenarios such as logging.

This following is an evolved proposal based on feedback from various sources. The original proposal is included below. Key changes were to make this a smaller type, alignment with object semantics, support of any object, and more focused non-boxing support.

Proposed API

public readonly struct Value
{
    public Value(object? value);
    public Value(byte value);
    public Value(byte? value);
    public Value(sbyte value);
    public Value(sbyte? value);
    public Value(bool value);
    public Value(bool? value);
    public Value(char value);
    public Value(char? value);
    public Value(short value);
    public Value(short? value);
    public Value(int value);
    public Value(int? value);
    public Value(long value);
    public Value(long? value);
    public Value(ushort value);
    public Value(ushort? value);
    public Value(uint value);
    public Value(uint? value);
    public Value(ulong value);
    public Value(ulong? value);
    public Value(float value);
    public Value(float? value);
    public Value(double value);
    public Value(double? value);
    public Value(DateTimeOffset value);         // Boxes with offsets that don't fall between 1800 and 2250
    public Value(DateTimeOffset? value);        // Boxes with offsets that don't fall between 1800 and 2250
    public Value(DateTime value);
    public Value(DateTime? value);
    public Value(ArraySegment<byte> segment);
    public Value(ArraySegment<char> segment);
    // No decimal as it always boxes


    public Type? Type { get; }                      // Type or null if the Value represents null
    public static Value Create<T>(T value);
    public unsafe bool TryGetValue<T>(out T value); // Fastest extraction
    public T As<T>();                               // Throws InvalidCastException if not supported

    // For each type of constructor except `object`:
    public static implicit operator Value(int value) => new(value);
    public static explicit operator int(in Value value) => value.As<int>();
}

Fully working prototype

Usage Examples

public static void Foo(Value value)
{
    Type? type = value.Type;
    if (type == typeof(int))
    {
        int @int = value.As<int>();

        // Any casts that would work with object work with Value

        int? nullable = value.As<int?>();

        object o = value.As<object>();
    }

    if (value.TryGetValue(out long @long))
    {
        // TryGetValue follows the same casting rules as "As"
    }

    // Enums are not boxed if passed through the Create method
    Value dayValue = Value.Create(DayOfWeek.Friday);

    // Does not box (until Now is > 2250)
    Value localTime = DateTimeOffset.Now;
    localTime = Value.Create(DateTimeOffset.Now);
    localTime = new(DateTimeOffset.Now);

    // ArraySegment<char> and ArraySegment<byte> are supported
    Value segment = new ArraySegment<byte>(new byte[2]);

    // Any type can go into value, however. Unsupported types will box as they do with object.
    Value otherSegment = new(new ArraySegment<int>(new int[1]));
}

Details

Goals

  • Takes any value
  • Can be stored anywhere
  • Follows object semantics (can't box a nullable, for example)
  • Type is 128 bits on 64bit platforms
  • Internal data is opaque
  • Does not box intrinsics (outside of decimal)
  • Does not box nullable intrinsics
  • Does not box DateTime or most DateTimeOffset values (1800-2250 for local times supported)

Other benefits

  • As is can be implemented out-of-box

Other Possible Names

  • Variant
  • ValueObject
  • ???
Original Proposal Currently there is no way to pass around a heterogeneous set of .NET value types without boxing them into objects or creating a custom wrapper struct. To facilitate low allocation exchange of value types we should provide a struct that allows passing the information without heap allocations. The canonical example of where this would be useful is in `String.Format`.

Related proposals and sample PRs

Goals

  1. Support intrinsic value types (int, float, etc.)
  2. Support most common value types used in formatting (DateTime)
  3. Have high performance
  4. Balance struct size against type usage frequency
  5. Facilitate "raw" removal of value type data (you want to force cast to int, fine)
  6. Provide a mechanism for passing a small collection of Variants via the stack
  7. Allow all types by falling back to boxing
  8. Support low allocation interpolated strings

Non Goals

  1. Support all value types without boxing
  2. Make it work as well on .NET Framework as it does on Core (presuming it's possible in the final design)

Nice to Have

  1. Usable on .NET Framework (currently does)

General Approach

Variant is a struct that contains an object pointer and a "union" struct that allows stashing of arbitrary blittable (i.e. where unmanaged) value types that are within a specific size constraint.

Sample Usage

// Consuming method
public void Foo(ReadOnlySpan<Variant> data)
{
     foreach (Variant item in data)
     {
         switch (item.Type)
         {
             case VariantType.Int32:
             //   ...
         }
     }
}

// Calling method
public void Bar()
{
     var data = Variant.Create(42, true, "Wow");
     Foo(data.ToSpan());

     // Only needed if running on .NET Framework
     data.KeepAlive();
}

Surface Area

namespace System
{
    /// <summary>
    /// <see cref="Variant"/> is a wrapper that avoids boxing common value types.
    /// </summary>
    public readonly struct Variant
    {
        public readonly VariantType Type;

        /// <summary>
        /// Get the value as an object if the value is stored as an object.
        /// </summary>
        /// <param name="value">The value, if an object, or null.</param>
        /// <returns>True if the value is actually an object.</returns>
        public bool TryGetValue(out object value);

        /// <summary>
        /// Get the value as the requested type <typeparamref name="T"/> if actually stored as that type.
        /// </summary>
        /// <param name="value">The value if stored as (T), or default.</param>
        /// <returns>True if the <see cref="Variant"/> is of the requested type.</returns>
        public unsafe bool TryGetValue<T>(out T value) where T : unmanaged;

        // We have explicit constructors for each of the supported types for performance
        // and to restrict Variant to "safe" types. Allowing any struct that would fit
        // into the Union would expose users to issues where bad struct state could cause
        // hard failures like buffer overruns etc.
        public Variant(bool value);
        public Variant(byte value); 
        public Variant(sbyte value);
        public Variant(short value);
        public Variant(ushort value);
        public Variant(int value);
        public Variant(uint value);
        public Variant(long value);
        public Variant(ulong value);
        public Variant(float value);
        public Variant(double value);
        public Variant(decimal value);
        public Variant(DateTime value);
        public Variant(DateTimeOffset value);
        public Variant(Guid value);
        public Variant(object value);

        /// <summary>
        /// Get the value as an object, boxing if necessary.
        /// </summary>
        public object Box();

        // Idea is that you can cast to whatever supported type you want if you're explicit.
        // Worst case is you get default or nonsense values.

        public static explicit operator bool(in Variant variant);
        public static explicit operator byte(in Variant variant);
        public static explicit operator char(in Variant variant);
        public static explicit operator DateTime(in Variant variant);
        public static explicit operator DateTimeOffset(in Variant variant);
        public static explicit operator decimal(in Variant variant);
        public static explicit operator double(in Variant variant);
        public static explicit operator Guid(in Variant variant);
        public static explicit operator short(in Variant variant);
        public static explicit operator int(in Variant variant);
        public static explicit operator long(in Variant variant);
        public static explicit operator sbyte(in Variant variant);
        public static explicit operator float(in Variant variant);
        public static explicit operator TimeSpan(in Variant variant);
        public static explicit operator ushort(in Variant variant);
        public static explicit operator uint(in Variant variant);
        public static explicit operator ulong(in Variant variant);

        public static implicit operator Variant(bool value);
        public static implicit operator Variant(byte value);
        public static implicit operator Variant(char value);
        public static implicit operator Variant(DateTime value);
        public static implicit operator Variant(DateTimeOffset value);
        public static implicit operator Variant(decimal value);
        public static implicit operator Variant(double value);
        public static implicit operator Variant(Guid value);
        public static implicit operator Variant(short value);
        public static implicit operator Variant(int value);
        public static implicit operator Variant(long value);
        public static implicit operator Variant(sbyte value);
        public static implicit operator Variant(float value);
        public static implicit operator Variant(TimeSpan value);
        public static implicit operator Variant(ushort value);
        public static implicit operator Variant(uint value);
        public static implicit operator Variant(ulong value);

        // Common object types
        public static implicit operator Variant(string value);

        public static Variant Create(in Variant variant) => variant;
        public static Variant2 Create(in Variant first, in Variant second) => new Variant2(in first, in second);
        public static Variant3 Create(in Variant first, in Variant second, in Variant third) => new Variant3(in first, in second, in third);
    }

    // Here we could use values where we leverage bit flags to categorize quickly (such as integer values, floating point, etc.)
    public enum VariantType
    {
        Object,
        Byte,
        SByte,
        Char,
        Boolean,
        Int16,
        UInt16,
        Int32,
        UInt32,
        Int64,
        UInt64,
        DateTime,
        DateTimeOffset,
        TimeSpan,
        Single,
        Double,
        Decimal,
        Guid
    }

    // This is an "advanced" pattern we can use to create stack based spans of Variant. Would also create at least a Variant3.
    public readonly struct Variant2
    {
        public readonly Variant First;
        public readonly Variant Second;

        public Variant2(in Variant first, in Variant second);

        // This is for keeping objects rooted on .NET Framework once turned into a Span (similar to GC.KeepAlive(), but avoiding boxing).
        [MethodImpl(MethodImplOptions.NoInlining)]        
        public void KeepAlive();

        public ReadOnlySpan<Variant> ToSpan();
    }
}

FAQ

Why "Variant"?

  • It does perform a function "similar" to OLE/COM Variant so the term "fits". Other name suggestions are welcome.

Why isn't Variant a ref struct?

  • Primarily because you can't create a Span of ref structs.
  • We also want to give the ability to store arrays of these on the heap when needed

What about variadic argument support (__arglist, ArgIterator, etc.)?

  • Short answer: not sufficient. Referred to as "Vararg" in the CLI specification, the current implemenation is primarily for C++/CLI. It isn't supported on Core yet and would require significant investment to support scenarios here reliably and to support non-Windows environments. This would put any solution based on this way out and may make down level support impossible.

What about TypedReference and __makeref, etc.?

  • TypedReference is a ref struct (see above). Variant gives us more implementation flexibility, doesn't rely on undocumented keywords, and is actually faster. (Simple test of wrapping/unwrapping an int it is roughly 10-12% faster depending on inlining.)

Why not support anything that fits?

  • We could in theory, but there would be safety concerns with getting the data back out. To support high performance usage we want to allow hard casts of value data.

How about enums?

  • This one may be worth it and is technically doable. Still investigating...

cc: @jaredpar, @vancem, @danmosemsft, @jkotas, @davidwrighton, @stephentoub

@JeremyKuhne JeremyKuhne self-assigned this Mar 6, 2019
@benaadams
Copy link
Member

Would this be a 16 byte (Guid/Decimal) + enum sized struct? (24 bytes with padding on x64)

@jkotas
Copy link
Member

jkotas commented Mar 6, 2019

TypedReference is a ref struct (see above). Variant gives us more implementation flexibility, doesn't rely on undocumented keywords, and is actually faster.

These all can be fixed, without too much work. TypedReference has been neglected, but that does not mean it is a useless type. (Some of this is described in https://github.com/dotnet/corefx/issues/29736.)

I think fixing TypedReference would be a better choice than introducing a new Variant type, if everything else is equal.

Allow all types by falling back to boxing

I think the design should allow all types without falling back to boxing.

Work on .NET Framework

This should be a non-goal. It is fine if the winning design that we pick happens to work on .NET Framework, but trying to make it work on .NET Framework should be an explicit non-goal. We have made a contious design to not restrict our design choices to what works on .NET Framework.

@JeremyKuhne
Copy link
Member Author

Would this be a 16 byte (Guid/Decimal) + enum sized struct? (24 bytes with padding on x64)

Goal is 24 bytes. We've looked at a lot of different ways of packing that in. A pointer and 16 bytes of data. It might involve some contortions or dropping down to 12 bytes of data.

but that does not mean it is a useless type.

Not trying to infer it is useless, just not appropriate in this case. I'm not sure how you'd make it a non-ref struct or make as fast as something targeted at key types.

This should be a non-goal.

Fair enough, I've changed it to nice-to-have. There are, however, real business needs for mitigating formatting inefficiencies on .NET Framework.

I think the design should allow all types without falling back to boxing.

I think we should have some design that does this but I don't think we can provide a solution that solves everything for all scenarios well. Having multiple approaches doesn't seem like a terrible thing to me, particularly given that we could make this sort of solution available much much sooner than full varargs support.

@stephentoub
Copy link
Member

I think we should have some design that does this but I don't think we can provide a solution that solves everything for all scenarios well. Having multiple approaches doesn't seem like a terrible thing to me, particularly given that we could make this sort of solution available much much sooner than full varargs support.

FWIW, this approach feels very limited to me, in that I see supporting every value type as a key scenario. I would rather see, for example, a simple unsafe annotation/attribute that would let the API tell the JIT that it promises wholeheartedly an argument won't escape, and then add an overload that takes a [UnsafeWontEscape] params ReadOnlySpan<object> args, where the JIT would stack-allocate the boxes for any value types provided. Just an example.

@JeremyKuhne
Copy link
Member Author

JeremyKuhne commented Mar 6, 2019

FWIW, this approach feels very limited to me, in that I see supporting every value type as a key scenario. I would rather see, for example, a simple unsafe annotation/attribute that would let the API tell the JIT that it promises wholeheartedly an argument won't escape, and then add an overload that takes a [UnsafeWontEscape] params ReadOnlySpan<object> args, where the JIT would stack-allocate the boxes for any value types provided. Just an example.

To be super clear, I don't see this as a solves-all-boxing solution. I absolutely think we can benefit from broader approaches, but I have a concern about being efficient with core types. Being able to quickly tell that you have an int and extract it is super valuable I think. Certainly for the String.Format case, for example. :)

@jkotas
Copy link
Member

jkotas commented Mar 6, 2019

Being able to quickly tell that you have an int and extract it is super valuable I think.

Depends on how the actual formatting is implemented. If you can dispatch a virtual formatting method, ability to switch over a primitive type does not seem super valuable.

[UnsafeWontEscape] params ReadOnlySpan<object> args

Something like this would work too. It is pretty similar to ReadOnlySpan<TypedReference> on the surface, with different tradeoffs and low-level building blocks.

@stephentoub
Copy link
Member

It is pretty similar to ReadOnlySpan on the surface, with different tradeoffs and low-level building blocks.

I'd be fine with that as well if it was similarly seamless to a caller.

@jaredpar
Copy link
Member

jaredpar commented Mar 6, 2019

Rather than an attribute and a promise I'd like to leverage the type system if possible here. 😄

What if instead we added a JIT intrinsic that "boxes" value types into a ref struct named Boxed. This type would have just enough information to allow manipulation of the boxed value:

ref struct Boxed {
  Type GetBoxedType();
  T GetBoxedValue<T>();
}

The JIT could choose to make this a heap or stack allocation depending on the scenario. The important part is that it would move the boxing operation into a type whose lifetime we need to carefully monitor. The compiler will do it for us.

That doesn't completely solve the problem because you can't have ReadOnlySpan<Boxed> as a ref struct can't be a generic argument. That's not because of a fundamental limitation of the type system but more because we didn't have a motivating scenario. Suppose this scenario was enough and we went through the work in C# to allow it. Then we could have the signature of the method be params ReadOnlySpan<Boxed>. No promises needed here, the compiler will be happy to make developers do the right thing 😉

@stephentoub
Copy link
Member

That also sounds reasonable.

(Though the [UnsafeWontEscape] approach could also work on the existing APIs: we just attribute the existing object arguments in the existing methods, and apps just get better.)

@jkotas
Copy link
Member

jkotas commented Mar 6, 2019

How would struct Boxed differ from existing TypedReference (with extra methods added to make it useful)?

Either way, it sounds reasonable too.

@vancem
Copy link
Contributor

vancem commented Mar 6, 2019

I do think that if our goal is just to solve the parameter passing problem, something based on references (which can work uniformly on all types) is worth thinking about (this is Jan's TypedReference approach).

However that does leave out the ability to have something that can represent anything (but all primitives efficiently (without extra allocation)) that you can put into objects (which is what Variant is).

I think the fact that we don't have a standard 'Variant' type in the framework is rather unfortunate. Ultimately it is an 'obvious' type to have in the system (even if ultimately you solve the parameter passing issue with some magic stack allocated array of types references).

I also am concernd that we are solving a 'simple' problem (passing prameters) with a more complex one (tricky refernece based classes whose safety is at best subtle).

I think we should have a Variant class, it is straightforward, and does solve some immediate problems without having to design a rather advanced feature (that probably would not make V3.0.

For what it is worth...

@jkotas
Copy link
Member

jkotas commented Mar 6, 2019

we don't have a standard 'Variant' type in the framework is rather unfortunate.

I agree with that and the Variant proposal would look reasonable to me if the Variant was optimized for primitive types only. The proposal makes it optimized for primitive types and set of value types that we think are important for logging today. It does not feel like a design that will survive over time. I suspect that there will be need to optimize more types, but it won't be possible to extend the design to fit them.

@vancem
Copy link
Contributor

vancem commented Mar 6, 2019

Note that generally speaking, a Variant is a chunk of memory that holds things in-line and a pointer that allows you to hold 'anything'.

Semantically it is always the case that a variant can hold 'anything', so that is nice in that the there is not a 'sematic' cliff, only a performance cliff (thus as long as the new types that we might want to add in the future are not perf critical things are OK. I note that the list that really are perf-critical are pretty small and likely to not change over time (int, string, second tier are long, and maybe DateTime(Offset)). So I don't think we are taking a huge risk there.

And there are things you can do 'after the fact' Lets assume we only alotted 16 bytes for in-line data but we wanted something bigger. If there is any 'skew' to the values (this would for most types, but not for random number generated IDs), you could at least store the 'likely' values inline and box the rest. It would probably be OK, and frankly it really is probably the right tradeoff (it would be surprising to me that a new type in the future so dominated the perf landscape over existing types that it was the right call to make the struct bigger to allow it to be stored inline). That has NEVER happened so far.

Indeed from a cost-benefit point of view, we really should be skewing things to the int and string case becasue these are so much more likely to dominate hot paths. We certainly don't want this to be bigger than 3 pointers, and it would be nice to get it down to 2 (but that does require heroics for any 8 byte sized things (long, double, datetime ...), so I think we are probably doing 3.

But it does feel like a 'stable' design (5 years from now we would not feel like we made a mistake), sure bugger types will be slow, but I don't think would want to make the type bigger even if we could. It would be the wrong tradeoff.

So, I think Variant does have a reasonablys table design point, that can stand the test of time.

From my point of view, I would prefer that the implementation be tuned for overwhelmingly likely case of int an string). My ideal implementation would be a 8 bytes of inline-data / discriminator, and 1 object pointer. This is a pro

@stephentoub
Copy link
Member

stephentoub commented Mar 6, 2019

One of the main use cases this is being proposed for is around string interpolation and string formatting.

I realize there are other uses cases, so not necessarily instead of a something Variant-like, but specifically to address the case of string interpolation, I had another thought on an approach….

Today, you can define a method like:

AppendFormat(FormattableString s);

and use that as the target of string interpolation, e.g.

AppendFormat($”My type is {GetType()}.  My value is {_value:x}.);

Imagine we had a pattern (or an interface, though that adds challenge for ref structs) the compiler could recognize where a type could expose a method of the form:

AppendFormat(object value, ReadOnlySpan<char> format);

The type could expose additional overloads as well, and the compiler would use normal overload resolution when determining which method to call, but the above would be sufficient to allow string interpolation to be used with the type in the new way. We could add this method to StringBuilder, for example, along with additional overloads for efficiency, e.g.

public class StringBuilder
{
    public void AppendFormat(object value, ReadOnlySpan<char> format);
    public void AppendFormat(int value, ReadOnlySpan<char> format);
    public void AppendFormat(long value, ReadOnlySpan<char> format);
    public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);// etc.
}

We could also define new types (as could anyone), as long as they implemented this pattern, e.g.

public ref struct ValueStringBuilder
{
    public ValueStringBuilder(Span<char> initialBuffer);
    
    public void AppendFormat(FormattableString s);
    public void AppendFormat(object value, ReadOnlySpan<char> format);
    public void AppendFormat(int value, ReadOnlySpan<char> format);
    public void Appendformat(long value, ReadOnlySpan<char> format);
    public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);// etc.

    public Span<char> Value { get; }
}

Now, when you call:

ValueStringBuilder vsb =;
vsb.AppendFormat($”My type is {GetType()}.  My value is {_value:x}.);

rather than generating what it would generate today if this took a FormattableString:

vsb.AppendFormat(FormattableStringFactory.Create("My type is {0}. My value is {1:x}.”, new object[] { GetType(), (object)_value }));

or if it took a string:

vsb.AppendFormat(string.Format("My type is {0}. My value is {1:x}.”, GetType(), (object)_value));

it would instead generate:

vsb.AppendFormat(“My type is, default);
vsb.AppendFormat(GetType(), default);
vsb.AppendFormat(. My value is, default);
vsb.AppendFormat(_value, “x”);
vsb.AppendFormat(".", default);

There are more calls here, but most of the parsing is done at compile time rather than at run time, and a type can expose overloads to allow any type T to avoid boxing, including one that takes a generic T if so desired.

@benaadams
Copy link
Member

benaadams commented Mar 6, 2019

If you throw out Guid and Decimal (as they are 16 bytes); then you could use the object pointer as the discriminator; rather than enum.

e.g.

public readonly struct Variant : IFormattable
{
    private readonly IntPtr _data;
    private readonly object _typeOrData; 
    
    public unsafe bool TryGetValue<T>(out T value) where T : IFormattable
    {
        if (typeof(T) == typeof(int))
        {
			if ((object)typeof(T) == _typeOrData)
            {
                value = Unsafe.As<IntPtr, int>(in _data);
            }
            
            value = default;
            return false;
        }
        // etc.
    }

    public override string ToString()
    {
        return ToString(null, null);
    }

    public string ToString(string format, IFormatProvider formatProvider)
    {
        if ((object)typeof(int) == _typeOrData)
        {
			return Unsafe.As<IntPtr, int>(in _data).ToString(format, formatProvider);
        }
        // etc.
    }
}

And box others to _typeOrData, not ideal though

@vancem
Copy link
Contributor

vancem commented Mar 6, 2019

@benaadams - Generally I like the kind of approach you are suggesting.

In my ideal world, Variant would be a object reference and an 8 bytes for buffer. It should be super-fast on int and string, and non-allocating on data types 8 bytes or smaller (by using the object as a discriminator for 8 byte types). For Datatypes larger than 8 bytes, either box, or you encode the common values into 8 bytes or less, and box the uncommon values.

This has the effect of skewing the perf toward the overwhelmingly common cases of int and string (and they don't pay too much extra bloat for the rarer cases).

@JeremyKuhne
Copy link
Member Author

JeremyKuhne commented Mar 7, 2019

@stephentoub Generally speaking I like the idea of moving parsing to compile time. I'll play around to see what sort of perf implications it has.

One thing I'd want to make sure we have an answer for is how do we fit ValueFormatableString (or something similar) into this picture? Ideally we can add just one overload to Console.WriteLine() that will magically suck $"" away from Console.WriteLine(string). Could we leverage ValueStringBuilder for this?

int count = 42;
Console.WriteLine($"The count is {count}.");

// And we have the following overload
void WriteLine(in ValueStringBuilder builder);

// Then C# generates:
ValueStringBuilder vsb = new ValueStringBuilder();
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose(); // Note that this isn't critical, it just returns any rented space to the ArrayPool

We could also add overloads that take IFormatProvider, ValueStringBuilder? Or possibly just add an optional IFormatProvider on ValueStringBuilder? Then something like this could happen:

Console.WriteLine(myFormatProvider, $"The count is {count}.");

// Creates the following
ValueStringBuilder vsb = new ValueStringBuilder(myFormatProvider);
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose();

@JeremyKuhne
Copy link
Member Author

@benaadams, @vancem

If you throw out Guid and Decimal (as they are 16 bytes); then you could use the object pointer as the discriminator; rather than enum.

Pulling DateTimeOffset along for the ride is kind of important if we support DateTime as it is now the preferred replacement. That pushes over 8. The way I would squish that and Guid/Decimal in 24 bytes is to use sentinel objects for Guid/Decimal and squeeze a 4 byte enum in the "union". (Which is the same sort of thing @vance is talking about, but with a bigger bit bucket.) Ultimately we're stuck with some factor of 8 due to the struct packing, if we dial to 16 (the absolute smallest), it would require making 8 byte items slow and putting anything larger into a box.

It would be cool if we could borrow bits from the object pointer (much like an ATOM is used in Win32 APIs), but that obviously would require runtime support.

@jkotas
Copy link
Member

jkotas commented Mar 7, 2019

It is pretty common to pass around strings as Span<char> in modern high performance C#. It would be really nice if the high-performance formatting supported consuming Span<char> items.

@stephentoub
Copy link
Member

It would be really nice if the high-performance formatting supported consuming Span items.

This is one of the advantages I see to the aforementioned AppendFormat approach. In theory you just have another AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format) overload, and then you could do $"This contains a {string.AsSpan(3, 7)}" and have that "just work".

@jaredpar
Copy link
Member

jaredpar commented Mar 7, 2019

@stephentoub

This is one of the advantages I see to the aforementioned AppendFormat approach.

Indeed. In the AppendFormat approach the compiler would simply translate every value in the interpolated string to valueFormattableStringBuilder.AppendFormat(theValue) and then bind the expression exactly as it would be bound if typed out. That means you can add specialized overloads like AppendFormat(ReadOnlySpan<char>) now or years down the road and the compiler would just pick them up.

@JeremyKuhne
Copy link
Member Author

I'm going to break out a separate proposal for "interpolated string -> Append sequence" and do a bit of prototyping to examine the performance.

@stephentoub
Copy link
Member

@MgSam
Copy link

MgSam commented Mar 21, 2019

Just to add my 2 cents here- storing heterogeneous data whose types are not known are compile time has a lot more uses than just string interpolation. Take our old friend DataTable for example, to this day it remains the only way in the BCL to hold dynamic tabular data (until and unless a modern DataFrame type is ever added). And 100% of everything that you put in a DataTable is boxed.

Having a true Variant type could bring great performance benefits in such a scenario.

I'd even say its a far more important scenario than string interpolation. Most metrics have shown the popularity of Python exploding to one of the most-used languages in the last few years. And the reason is because of the great libraries it has for working with data. The market is clearly saying it wants better and more efficient ways of working with data and .NET should oblige.

@JeremyKuhne
Copy link
Member Author

@MgSam do you think avoiding boxing on common types is good enough? The initial proposal doesn't handle everything, but allows putting data on the heap (e.g. creating Variant[]). There are ways to create references to anything already (__makeref() and TypedReference), but:

  • There is no way currently to pass an open-ended list of them (on all platforms anyway)
  • They're slower
  • Can't be stored on the heap

Stashing arbitrary struct data in Variant isn't safe, so we have to restrict it to types that are known to have no ill side effects if their backing fields have random data. We're also constrained by the size of what we can stash.

@MgSam
Copy link

MgSam commented Mar 21, 2019

Yes, I think common types likely cover 95% of the use cases. You don't often have nested objects when working with large tables of data.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@jkotas
Copy link
Member

jkotas commented Apr 29, 2021

The Azure SDK is going to be using this type internally and ASP.Net is looking at this for logging

The proposal needs to have detail on these use cases. Are there going to be any public ASP.NET APIs that consume this type?

I don't think this is arbitrary at all. They're only "fundamental" types in my view.

The set of fundamental types depends on scenario. For example, we have a similar union in this repo here:

public struct Scalar
{
[FieldOffset(0)]
public bool AsBoolean;
[FieldOffset(0)]
public byte AsByte;
[FieldOffset(0)]
public sbyte AsSByte;
[FieldOffset(0)]
public char AsChar;
[FieldOffset(0)]
public short AsInt16;
[FieldOffset(0)]
public ushort AsUInt16;
[FieldOffset(0)]
public int AsInt32;
[FieldOffset(0)]
public uint AsUInt32;
[FieldOffset(0)]
public long AsInt64;
[FieldOffset(0)]
public ulong AsUInt64;
[FieldOffset(0)]
public IntPtr AsIntPtr;
[FieldOffset(0)]
public UIntPtr AsUIntPtr;
[FieldOffset(0)]
public float AsSingle;
[FieldOffset(0)]
public double AsDouble;
[FieldOffset(0)]
public Guid AsGuid;
[FieldOffset(0)]
public DateTime AsDateTime;
[FieldOffset(0)]
public DateTimeOffset AsDateTimeOffset;
[FieldOffset(0)]
public TimeSpan AsTimeSpan;
[FieldOffset(0)]
public decimal AsDecimal;
}
// Anything not covered by the Scalar union gets stored in this reference.
private readonly object? _reference;
private readonly Scalar _scalar;
private readonly int _scalarLength;
. It special cases different set of types.

@benaadams
Copy link
Member

Are these two examples kind of struct disciminated union? (Or one with named alas) dotnet/csharplang#113 🤔

@agocke
Copy link
Member

agocke commented Apr 30, 2021

This feels a lot like discriminated unions and I wonder if we could build the general purpose feature which allows for any type, including managed, and then have a mechanism for the compiler and runtime to work together to efficiently store constructions which happen to be unmanaged.

@davidfowl
Copy link
Member

I think the type needs to flow without being viral. I like both TypedReference approach for values that can't escape the heap and the variant approach for thing that do. I can also see a type like this being super useful for fast reflection and serialization. Today, generics are too viral and don't work for framework code and TypedReference is ref only and not usable in many scenarios where I'd want to use this. For just as an example, I've been looking at a way to do fast reflection for ages (not boxing the arguments and supporting Span). A version of this type that's supported any T would be ideal but I don't know what that would look like or if it would even be possible without runtime support (like span).

The other use case is logging without boxing. I'd like to allow callers to preserve primitive types without boxing and allow the logger provider to unwrap and serialize.

@sakno
Copy link
Contributor

sakno commented Apr 30, 2021

In my proposal I offered to use tuple as a container. Tuples are normal structs, not ref-like structs. Tuple can represent arbitrary number of arguments of any type (except Span<T> and ROS<T>). The only thing that must be provided by the runtime is a special intrinsic:

internal static string TupleItemToString<T>(in T tuple, int index, IFormatProvider? provider) where T : struct, ITuple;

It can be converted to more low-level version to be compatible with other scenarios like usage of IBufferWriter<char>:

internal static bool TupleItemToString<T>(in T tuple, int index, Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) where T : struct, ITuple;

Tuple can be passed to logging or Console method with the string containing template for formatting. From my point of view, this approach is reusable everywhere when you have a set of formatting arguments and the template. Interpolated string is also covered.

In reality, implementation of TupleItemToString method can be done without intrinsic:

  • Find TryFormat instance method of type T via reflection
  • Create open delegate
  • Cache this delegate
  • Reuse cached delegate instance for type T in every call of TupleItemToString

In case of JIT intrinsic, TupleItemToString can be replaced with pure IL without reflection.

Public API can be look like this:

public sealed class String
{
  public static string Format(IFormatProvider? provider, string format, in TArgs args)
    where TArgs : struct, System.Runtime.CompilerServices.ITuple;

  public static void Format(IFormatProvider? provider, string format, in TArgs args, IBufferWriter<char> output)
    where TArgs : struct, System.Runtime.CompilerServices.ITuple;
}

@davidfowl
Copy link
Member

This has all the problems I stated above about generic code. The T needs to flow everywhere and that's what Variant/Value and TypedReference solve that the ITuple solution does not

@agocke
Copy link
Member

agocke commented Apr 30, 2021

@davidfowl

A version of this type that's supported any T would be ideal but I don't know what that would look like or if it would even be possible without runtime support (like span)

Is this in response to my proposal? Discriminated unions are a feature for declaring types. My point is that you wouldn't build a single type to handle all possible use cases, each use case would declare a suitable type, and if that type happens to be purely unmanaged then the compiler would codegen it differently.

@davidfowl
Copy link
Member

davidfowl commented Apr 30, 2021

Is this in response to my proposal? Discriminated unions are a feature for declaring types. My point is that you wouldn't build a single type to handle all possible use cases, each use case would declare a suitable type, and if that type happens to be purely unmanaged then the compiler would codegen it differently.

I'm familiar with the DU proposal but I don't it's suitable for the same things Variant/Value will be used for.

@agocke
Copy link
Member

agocke commented Apr 30, 2021

I'd be interested in an example that you think couldn't be represented with discriminated unions. DUs seem to me a strict increase in expressive power.

@davidfowl
Copy link
Member

Here's a canonical example from logging:

public static Action<ILogger, T1, T2, T3, T4, T5, Exception?> Define<T1, T2, T3, T4, T5>(LogLevel logLevel, EventId eventId, string formatString, bool skipEnabledCheck)
{
LogValuesFormatter formatter = CreateLogValuesFormatter(formatString, expectedNamedParameterCount: 5);
void Log(ILogger logger, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, Exception? exception)
{
logger.Log(logLevel, eventId, new LogValues<T1, T2, T3, T4, T5>(formatter, arg1, arg2, arg3, arg4, arg5), exception, LogValues<T1, T2, T3, T4, T5>.Callback);
}
if (skipEnabledCheck)
{
return Log;
}
return (logger, arg1, arg2, arg3, arg4, arg5, exception) =>
{
if (logger.IsEnabled(logLevel))
{
Log(logger, arg1, arg2, arg3, arg4, arg5, exception);
}
};
}

We need to flow these generics to avoid boxing through the method, through the return value, then we need to make a generic LogValues object with the same number of generic arguments. Then when these objects get logged, we need the consumer to be able to unpack them from non-generic code, so we end up boxing everything through the IReadOnlyList<KeyValuePair<string, object>> interface. Ideally I would be able to preserve this type information without needing to have all generic code (IReadOnlyList<KeyValuePair<string, Value>>).

This also happens for the reflection APIs where you want to pass a variable sized Span to invoke a method. Here's an example of how reflection could use these APIs:

class MethodInfo
{
    public Value InvokeFast(object instance, Span<Value> args);
}

I really want a way to round trip a T/T[]/Span<T> without forcing all of the code to be generic and without boxing. This is the super power that Value gives me for primitive types and reference types. Custom value types don't get the benefit obviously but I assume we'd need something else for that.

The key issue is that framework code that's shuttling these types around doesn't want to force generics everywhere and it's even more complex when you have multiple generic arguments (like a Span).

Maybe what I want is associated types.

@jkotas
Copy link
Member

jkotas commented Apr 30, 2021

I really want a way to round trip a T/T[]/Span without forcing all of the code to be generic and without boxing. This is the super power that Value gives me for primitive types and reference types.

The Value type proposed here won't give you this super power. It won't work for Span. TypedReference has this superpower, and that is why the plan we are on with reflection is based on TypedReference.

@davidfowl
Copy link
Member

The Value type proposed here won't give you this super power. It won't work for Span. TypedReference has this superpower, and that is why the plan we are on with reflection is based on TypedReference.

I know it won't work with Span but the ref struct restrictions are too much for many scenarios. So I think we need TypedReference and Value (like Span and Memory).

@jkotas
Copy link
Member

jkotas commented Apr 30, 2021

So I think we need TypedReference and Value (like Span and Memory).

We have that already: TypedReference and object.

This proposal is about creating an alternative storage for object-like value that is more efficient for some set of types and less efficient for the rest. You can imagine to only use it as an internal implementation detail when you need to store the data on the heap. If it is an internal implementation detail, it does not need to be a public type and the set of the more efficient types can be tailored for each use case, and it is where the discriminated unions would be useful.

@agocke
Copy link
Member

agocke commented Apr 30, 2021

@davidfowl That many unconstrained generic parameters is a bit of a code smell. It may be that you want existential types. Lemme take some time to look at how all that is used and get back to you.

@davidfowl
Copy link
Member

This proposal is about creating an alternative storage for object-like value that is more efficient for some set of types and less efficient for the rest. You can imagine to only use it as an internal implementation detail when you need to store the data on the heap. If it is an internal implementation detail, it does not need to be a public type and the set of the more efficient types can be tailored for each use case, and it is where the discriminated unions would be useful.

I don't see how TypedReference solves the problem when I need the value off stack. The problem is it's not an internal contract because there's tons of public API boundaries that we need to cross to flow these types. Here's a super simple example, in SignalR, I would like this https://github.com/dotnet/aspnetcore/blob/547de595414d6ebb9ddeaa4a231815449c9f3c60/src/SignalR/common/SignalR.Common/src/Protocol/HubMethodInvocationMessage.cs#L22 to be a Value[] that would let me parse a message from the network and store that type information without boxing and without being generic.

@agocke Looking forward to what you come up with.

@jkotas
Copy link
Member

jkotas commented May 1, 2021

I don't see how TypedReference solves the problem when I need the value off stack

I agree that TypedReference does not work once you need to store it off stack. We have System.Object that handles this scenario today. This creates an alternative way to do the same. It is similar to the pattern we have with ValueTask vs. Task or ValueTuple vs. Tuple. (Following this pattern, we would name this type System.ValueObject.)

there's tons of public API boundaries that we need to cross to flow these types

It would be useful to have list of public APIs in the frameworks that would get overloads to use this type, and estimate the performance benefits.

@davidfowl
Copy link
Member

I spend sometime looking at where the types of places where this would be beneficial. The pattern is basically pushing the generic type as close to the place that is going to consume it as possible without expose it in all of the layers. It's also a way to store variable sizes primitive in a single type without boxing.

  • Serialization - A way to serialize and de-serialize primitives without boxing. This might be a bit narrow but we should support this type in the JSON serializer.
  • TypeConverter APIs - MVC uses these today to convert from various primitive from their string representation to a primitive. This boxes today. We could add ValueObject TypeConverter.ConvertFrom(...).
  • Logging - Today we use LogValues<T0,..,TN> with lots of generic arguments to stash unboxed values for logging. Then we could have a new contract to unpack to unpack KeyValuePair<string, ValueObject> to enable the logger provider to unpack without boxing value types.
  • EventSource - Today the event source needs to serialize the object in place. Instead it could stash a ValueObject[] instead of serializing or boxing the values.

@jkotas
Copy link
Member

jkotas commented May 1, 2021

Thoughts on estimating the performance benefits?

This is shifting costs: It reduces cycles spent on GC allocations, but pays for it by spending more cycles on compressing and decompressing the bits and by increasing concept count.

#28882 (comment) says "Most operations are a few to several nanoseconds". Boxing an integer costs about as much. Is it going to be a net win to replace one with the other?

@davidfowl
Copy link
Member

That's a good question. As for the concept count problem, I don't think it'll be widely used in a ton of public APIs, but it a couple of places where performance usually matters and generic types don't work. It also feels complementary to the typed reference based reflection APIs that we plan to add.

I guess we can try some experiments with a couple of the above scenarios and the existing implementation.

@davidfowl
Copy link
Member

davidfowl commented May 2, 2021

The other important place we want to add this would be in the ADO.NET. There's a ton of boxing there for similar reasons and we're currently investigating a high performance alternative that wants to avoid it. See DbParameter.Value.

It comes down to this, APIs that do want to squeeze out the performance and avoid GC allocations need this. They can each define their own exchange type of we can have something in the BCL provided for them. I don't know if it's a winning strategy to have each library define their own.

@roji
Copy link
Member

roji commented May 3, 2021

For ADO.NET, there's #17446 for adding an API to write parameters without boxing. A generic DbParameter<T> could be sufficient (but this needs to be designed).

@agocke
Copy link
Member

agocke commented May 3, 2021

Serialization

I'm skeptical we need anything more than generic lifetimes to handle this, given that https://github.com/agocke/serde.net works without boxing value types.

Ref structs can't be used because they can't be passed as generic arguments, but that's due to not tracking lifetimes through generics, which results in a safety hole at the moment if it's allowed for ref structs.

@jkotas
Copy link
Member

jkotas commented May 3, 2021

Serde is designed to generate lean efficient serializers at build time. The ValueObject is not relevant for serializers designed with the same principles as Serge.

Many .NET serializers and other popular libraries are not designed like that. They are very dynamic systems that do a lot of discovery at runtime. Trying to fit these components into the same mode as Serde as an after-though typically does not work well. ValueObject is meant for performance optimizations of dynamic components like that.

@JeremyKuhne JeremyKuhne changed the title API Proposal: Add Variant type to avoid boxing .NET intrinsic types API Proposal: Add type to avoid boxing .NET intrinsic types May 4, 2021
@DaZombieKiller
Copy link
Contributor

Would it be worthwhile to support arbitrary (unmanaged) user-defined structures in the variant without boxing if they fit within 8 bytes? Based on the prototype in the OP, it'd be a matter of extending the TypeFlag class to support reading and writing the value from bytes (and doing the appropriate SizeOf/IsReferenceOrContainsReferences checks in Value.Create.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests