Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Protoype for nonallocating string formatting #2595

Closed
wants to merge 5 commits into from

Conversation

JeremyKuhne
Copy link
Member

The goal here is to provide a mechanism for using interpolated strings ($"")
without any unnecessary boxing and allocations for intrinsic types.

The intent is also to provide a non-boxing format solution for the most
common framework types, including Guid/TimeSpan and enums.

This is an early prototype project to explore this approach and potentially others. It can make a significant impact on allocations in format heavy code such as MSBuild logging. (Some sample perf tests are included.) GC pressure notwithstanding base perf is slightly slower than calling String.Format if you don't take advantage of stack scratch space for ValueStringBuilder. If you do, this approach is faster and allocates less (zero for common types).

cc: @jaredpar, @vancem, @davidwrighton, @jkotas, @danmosemsft, @iSazonov, @GrabYourPitchforks

@stephentoub
Copy link
Member

Thanks, @JeremyKuhne. I skimmed this quickly, but is there code in here somewhere that demonstrates the canonical way you'd expect this to be used? And what features does this need from the C# compiler to be used as desired?

@JeremyKuhne
Copy link
Member Author

I skimmed this quickly, but is there code in here somewhere that demonstrates the canonical way you'd expect this to be used? And what features does this need from the C# compiler to be used as desired?

I'll follow up with more descriptive examples over the next few days. There is usage "now" and things we want to make it nicer, like params Span<Variant> args (or something similar).

@stephentoub
Copy link
Member

stephentoub commented Nov 20, 2018

Separately, I'd envisioned something along these lines:

  • We expose the ISpanFormattable that we already have internally and implement on the primitive types, Guid, Version, DateTime, etc. and which provides a TryFormat for formatting into a span.
  • We augment the C# compiler with a more generalized target-typing support for interpolated strings, such that rather than just target typing to string and FormattableString, it supports target typing to anything (waving my hands) that exposes a Create method that takes a string for the format and then the interpolated arguments, e.g. if you did $"Hello {someInt32Value} world", it would look for a Create method that took a string and either an Int32 or a T that could be an Int32.
  • We expose methods on a ValueStringBuilder like Append(ValueStringBuilderParams target), where ValueStringBuilderParams has a Create<TArg1>(string format, TArg1 arg) where TArg1 : ISpanFormattable method that returns the generic ValueStringBuilderParams<TArg1> containing the string and the Int32.

Etc.

There are probably some holes there, but that's approximately what I'd had in mind. I'll be interested in seeing how close or far that is from what you've been noodling on.

@jaredpar
Copy link
Member

@stephentoub

We augment the C# compiler with a more generalized target-typing support for interpolated strings, such that rather than just target typing to string and FormattableString, it supports target typing to anything (waving my hands) that exposes a Create method that takes a string for the format and then the interpolated arguments, e.g. if you did $"Hello {someInt32Value} world", it would look for a Create method that took a string and either an Int32 or a T that could be an Int32.

That's essentially what @JeremyKuhne and I had been thinking about. I'm going to be getting the write up on this complete over the holiday break.

@stephentoub
Copy link
Member

Thanks.

/// <summary>
/// <see cref="Variant"/> is a wrapper that avoids boxing common value types.
/// </summary>
public readonly struct Variant
Copy link
Member

@jkotas jkotas Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative way to avoid the allocation for this and many other cases (e.g. delegates) would be introduce ability to annotate arguments as non-escaping. The JIT would be then able to reliably convert the heap allocation into stack allocation; and we would not need to introduce a duplicate types and APIs every time we need to get something predictably allocated on the stack.

We may need similar annotations to allow generic types instantiated over ref types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @AndyAyersMS please include other JIT devs for visibility on the issue of stack-bound lifetime of GC objects. I know we have had an intern project on this, but I believe that was just inference based. I suspect Jan's suggestion is about a user declaration that effectively asks the type system to inure that the objects's lifetime is not allow to escape the current method, but the goal is the same (you can stack allocate it).

I am assuming however, that we are sticking with the value-type approach here however, as getting the 'auto-stack-allocation' working is probably non-trivial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have had an intern project on this

It's becoming real, e.g. dotnet/coreclr#20814

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming however, that we are sticking with the value-type approach here however, as getting the 'auto-stack-allocation' working is probably non-trivial.

If it is based on annotations, then it is fairly easy to do for the JIT (we have infrastructure for it already).

The non-allocation formatting that we are talking about is a cross-cutting language/libraries feature, so it is "non-trivial" in this sense already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative way to avoid the allocation for this and many other cases (e.g. delegates) would be introduce ability to annotate arguments as non-escaping.

That's essentially asking to introduce borrowing into the system. Went down that rabbit hole in Midori. Lots of dragons down there. For example: once you annotate parameters as non-escaping you can't call instance methods on them unless this is also marked as non-escaping. Doable but the annotation gets viral very fast.

Let me see if I can dig up the write up I had on why this gets a bit out of control fairly fast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we consider having annotations that are partially or completely unvalidated? i.e. the developer promises the object won't escape, and if it does, things break, ala Unsafe.*.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaping analysis isn't really sufficient. To avoid awkward performance problems, the annotation needs to be doesn't escape, and object pointed at is never mutated. Looking at the use cases here, it's a vaguely reasonable assumption, but its a hard one to be sure about, especially in the presence of something like profiler rejit. (As the mutation would trigger a need to use a checked write barrier instead of a normal write barrier, and we'd really rather not do that to stores to the heap.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid awkward performance problems, the annotation needs to be doesn't escape, and object pointed at is never mutated

Can we document that mutation of objects pointed to by non-escaping references is slower?

We have the same issue with struct today. It is super rare somebody notices that mutation of object references in structs is slower, and I do not think it is even documented anywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, possibly. In any case, if we want to go the route of annotations we need to implement the annotation before we make ISpanFormattable a public api, as the annotation needs to be on ISpanFormattable the this pointer used to invoke an instance of ISpanFormattable.

Copy link
Member

@jkotas jkotas Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to implement the annotation before we make ISpanFormattable a public api

Agree.

In any case, it would be useful to have the overall no-allocation high-performance formatting story figured out before we start making parts of it public. ISpanFormattable worked ok as internal implementation detail, but it may not be the right type to set in stone to support the public no-allocation high-performance string formatting story.

}

// Idea is that you can cast to whatever supported type you want if you're explicit.
// Worst case is you get default or nonsense values.
Copy link

@morganbr morganbr Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? There are a lot of things that can go wrong from reasonable-looking code. For example:

(long)(Variant)(-1) would return 4294967295 because the unsafe conversion doesn't sign extend.
(int)new Variant((object)1) would return 0 because the boxed int's value isn't stored in the union.

It also just allows accidents in ways we don't usually. For example, this mess becomes possible:

var var1 = new Variant(DateTime.Now);
var var2 = new Variant(false);

// accidentally confuse var1 and var2
if ((bool)var1) { ... } // Happens because DateTime.Now isn't 0
if ((bool)var1 == true) { ... } // Doesn't happen because DateTime.Now isn't 1
if ((bool)var1 == false) { ... } // Doesn't happen either! (Because DateTime.Now isn't 0)

All of this could be avoided by doing some combination of checking the types for compatibility and doing proper conversion instead of returning garbage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need exposure of the raw data in some way when checking VariantType manually (such as the switch statement in TryFormat()).

I started with hard casts as exposing CastTo<T>, being unconstrained, isn't safe. I'd like to have "safe", fast, "power" unwrapping. Perhaps exposing in a VariantMarshal class would be sufficient.

I didn't think through ConvertTo scenarios yet, but I can see value in having support. I'll play around with it.

Any and all suggestions are welcome, of course. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this kind of thing as a private implementation detail of TryFormat is fine since the end result is completely type-safe. However, as written it's a public API. Before doing something unsafe, I'd suggest measuring reasonable scenarios that do safe conversion and seeing if they're good enough. A switch statement that does appropriate conversion and unboxing is probably reasonably quick.

Do you actually have a scenario for a public API like VariantMarshal? If users want a type-unsafe union, they can already create their own in the same way that you did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: i would love to have this for the ability to trivially convert a ulong to a long even in code that is compiled under 'checked'. The Roslyn IDE has to ahve all sorts of nasty code to try to do this, for example: https://github.com/dotnet/roslyn/blob/acb806b162a2f3496abb1aad8cc7a1b96731848e/src/Workspaces/Core/Portable/Shared/Utilities/IntegerUtilities.cs#L86-L94

@ahsonkhan
Copy link
Member

cc @krwq

The goal here is to provide a mechanism for using interpolated strings ($"")
without any unnecessary boxing and allocations for intrinsic types.

The intent is also to provide a non-boxing format solution for the most
common framework types, including Guid/TimeSpan and enums.
private char[] _arrayToReturnToPool;
private Span<char> _chars;
private int _pos;
private fixed char _default[DefaultBufferSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is not analyzed in constructors? Perhaps it is not needed in public ValueStringBuilder(Span<char> initialBuffer) but in public ValueStringBuilder(int initialCapacity) we could check that initialCapacity > DefaultBufferSize before allocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a simple copy from CoreFX/CoreCLR (made public)

Is it worth all the comments on this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov You can't allocate on the stack and return to the caller- being that this is a reference type this is a hacky way of "stack allocating". I'm experimenting here and seeing the impact of different approaches.

@khellang I'm not sure what you mean? I do need to update the comment for the small changes I've made.

Copy link
Contributor

@iSazonov iSazonov Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't allocate on the stack and return to the caller

Yes. My thoughts was that we could allocate only in ToString() or never if we don't call ToString().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean? I do need to update the comment for the small changes I've made.

@JeremyKuhne I'm just asking if there's any point in reviewing this file as part of this PR since I'd assume changes should be made in the original file 😄

private Span<char> _chars;
private int _pos;
private fixed char _default[DefaultBufferSize];
private const int DefaultBufferSize = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For stack allocation the value looks very small.
Full date format string "Thursday, January 17, 2019 5:05:29 PM" is 37 chars.
So I'd expect the const is 64 or 128 that exclude pool allocations for common formatting scenarios.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very small- but it is allocated every time (whether or not you pass in your own buffer). I picked that number as that is what StringBuilder creates as a default size. Trying to get apples-to-apples here and experiment with different approaches.

I'm considering how to potentially cache what we get out of the array pool. Keeping a thread local perhaps with the largest buffer we've used (up to some max) and pulling that size by default from the pool when we grow (somewhat along the lines of what the StringBuilderCache does). I'm experimenting right now with that as copying the data on growth is a non-trivial cost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm experimenting right now with that as copying the data on growth is a non-trivial cost.

What formatting we want to speed up? I'd collect such common formatting patterns (like datetime and other frequently used types) and measure allocations. I guess we get the const in 64-128 range. Then fallback to arraypool.


public int Length
{
get => _pos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name _pos confuses me (current/last position?) - _length is more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not changing any names here where I didn't change code from what is currently in .NET to make it easier to see changes. We can definitely iterate on existing terminology when/if we make this public.

public void EnsureCapacity(int capacity)
{
if (capacity > _chars.Length)
Grow(capacity - _chars.Length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use local variable for _chars.Length or compiler does the optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that the JIT would handle this, but I'll look.

public override string ToString()
{
var s = _chars.Slice(0, _pos).ToString();
Dispose();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need the dispose here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an internal convenience/safety thing as I recall. I'm not sure we'd do that if we made this public- particularly now that we're going to use using with this.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Dispose()
{
char[] toReturn = _arrayToReturnToPool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed for ref struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is what needed? Dispose? Yes, as we want to return arrays to the pool to minimize allocations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant "Is the temporary variable needed for ref struct?"

while (true)
{
// Scan for an argument hole (braces)
while (position < formatLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a core concern i have here is that it seems like (and correct me if i'm wrong), everyone needs to know how to process a format string. That seems highly undesirable and error prone. Is there a canonical, lightweight (ideally, stack allocated) parser that one can use to drive any person who needs this?

Alternatively, instead of having to parse this out at runtime, would it be perhaps better to have the C# compiler automatically provide a prebaked struct that defines all this? For example, in JS/TS when you use interpolated strings, you get 3 arrays back from the compiler:

  1. the array of literals, with escapes interpreted. i.e. if the original code was: foo `bar \t { baz } quux`, then this woudl have ["bar \t ", " quux"].
  2. the array of literals, with escapes untouched. Using hte above example, you would get back: ["bar \\t ", " quux"].
  3. the values of the interpolations. i.e. whatever value 'baz' has.

It's then easy for consumers to understand what they have and stitch things together. There's no need for them to figure out htings like "oh, i have a {, is it also followed by a {?". The compiler has already figured that out for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: by stack allocated parser, i mean one similar to how you might see an event driven json/xml parser. so you would do this:

var parser = new Parser(formatString);
while (parser.MoveToNext())
{
    switch (parser.Current)
    {
         case Text:
         case FormattingParameter:
    }
}

I'd prefer this all just be in a stack allocated ref-structure that teh compiler could just pass into the method. But, absent that, having a parser that anyone can use seems preferable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this has to be a drop-in replacement for StringBuilder.AppendFormat so we're constrained somewhat.

I'm not sure I follow on why everyone needs to know how to parse the format string. There currently is only one parser in .NET string formatting, this is simply a low-allocation version of that. It isn't intended to be customized / reimplemented.

Having the compiler provide a pre-parsed format string would require having a constant format string. That isn't always the case. While we could support some sort of pre-parsed data structure I'm skeptical that we'd be able to do so in a way that is more performant than walking through the string. Additionally it is currently impossible to format arguments to prep this without allocating a lot of string objects.

While making this implementation pluggable would be nice, it would likely be difficult to get close to the performance of a monolithic method. That said I'd love to see deeper dives into this sort of thing if anyone is willing to fully explore it.

/// <remarks>
/// This is a pattern we can use to create stack based spans of <see cref="Variant"/>.
/// </remarks>
public readonly struct Variant2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is htis substantively different from ValueTuple<Variant, Variant>?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueTuple isn't readonly?

Copy link

@masonwheeler masonwheeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for supporting arbitrary enums looks cool, but as it is it creates ambiguity. If you create a Variant with an enum value and put the value in _union as an int, and the type in _object as a System.Type, you'll end up with a Variant whose Type value is VariantType.Object. And so a consumer looking at this will see it as an Object variant whose value is an object of type System.Type.

To properly support a trick like this, at the very least you need a VariantType.Enum value. A few enum-specific methods on the Variant type would be helpful too.

@JeremyKuhne
Copy link
Member Author

To properly support a trick like this, at the very least you need a VariantType.Enum value. A few enum-specific methods on the Variant type would be helpful too.

That was generally what I was thinking. I'm not sure how that or using "T where unmanaged" might ultimately work out, but certainly worth digging into further.

@masonwheeler
Copy link

masonwheeler commented Mar 5, 2019

Also, if I might add one thing, this would probably work better with the Variant stuff being its own PR rather than a piece of a larger string formatting PR.

This really looks like it was inspired by the COM Variant type, and to those of us with experience with Variants, it has a lot of valid use cases that have nothing to do with string formatting. It would make for cleaner discussion if that had its own discussion thread without being tied to the perspective of one very specific use case.

@JeremyKuhne
Copy link
Member Author

Also, if I might add one thing, this would probably work better with the Variant stuff being its own PR rather than a piece of a larger string formatting PR.

The formatting change fundamentally depends on it so I can't really pull it out of the PR. I'll be creating a tracking issue for Variant in CoreFX shortly to go along with the existing ValueStringBuilder proposal. https://github.com/dotnet/corefx/issues/28379

This really looks like it was inspired by the COM Variant type

While I wasn't specifically trying to create a COM Variant replacement, it does generally align with it. I couldn't think of a better name for this, although I was concerned about the "baggage" it carries for some. 😀


namespace System.Text
{
public ref partial struct ValueStringBuilder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this strategy, which allows for rent/return, could be blended in with the params proposal at dotnet/csharplang#2302 as well. Right now, ignoring cheap stack allocation for arrays (which put limitations on the size), a possible translation of:

Use(1);
Use(1, 2);
Use(1, 2, 3);

with a

void Use(params Span<int> xs) { ... }

could allocate an int[3] array, with Span<int> views over it for subsequent invocations of Use. However, it does seem to preclude library assistance to mediate the int[] allocation, so the optimization works well within a single method but not across. With the work in System.Memory on pools, it'd seem possible to rent/return arrays.

One way would be to simply substitute newarr for a Rent operation, provided the compiler has proven that the arrays can't escape (e.g. only spans are handed out, and no refs to elements therein are returned). That'd effectively result in a single Rent(n) operation, where n is the maximum of all lengths required in the body of a method. The corresponding Return would have to end up in a try...finally... construct. A drawback is the tight coupling of the language to another API. Another possible concern is a long-running method where n is big because of one single large params site, thus holding on to the large array for longer than necessary. FWIW, this is no different than lifetime issues with big closures (due to the union of all hoisted variables ending up in the same closure object, thus any delegate extends the lifetime of the entire closure), and n is likely never very big because it's proportional to the argument count in user-written code.

An alternative would be for each params site to result in a builder pattern, derived from the type of the parameter to which params is applied. Given such a type P, there's a builder type B which has the following operations:

  • static I Create(int capacity) to create the intermediate object I (call that i),
  • i.Add(T) or i.Set(int, T) methods to add or set elements. These could return void or I (to create a fluent pattern) and are bound with maximum flexibility (e.g. allowing extension methods).
    • Note I'm omitting T this[int] { set; } as an option for a member on i because it'd only work if all elements are of the same type (which brings me to interpolated string further on).
  • i.Result is a property or field of type P, returning the result, and,
  • i has an optional Dispose, e.g. for value types akin to ValueStringBuilder which can perform a Return.

This is somewhat similar to collection initializers simply looking for an Add method that can be called as an instance method. Also, looking up a builder type for a given type could be done in ways similar to task-like types (but supporting any generic arity by pouring generic arguments over between P and B).

For example:

void Use(double d, params P xs) { ... }

Use(Math.Sin(x), 1, "foo", true);

would be syntactic sugar for:

var __e = Math.Sin(x);

using (var __t = B.Create(3))
{
  __t.Set(0, 1);
  __t.Set(1, "foo");
  __t.Set(2, true);

  Use(__e, __t.Result);
}

with (pattern-based) using being optional, depending on whether __t's type supports Dispose. This is given a definition of P like this:

[CollectionBuilder(typeof(B))]
class P {}

static class B
{
  public static I Create(int capacity);

  public struct I
  {
    // Or a generic Set<T> would work, too.
    public void Set(int i, int x);
    public void Set(int i, string s);
    public void Set(int i, bool b);

    public P Result { get; }
  }
}

Using this pattern would allow for flexibility and extensibility, e.g. to construct an ImmutableArray<T> or ReadOnlyCollection<T> by providing builders for these types. These builders receive the capacity (which could be implemented as a minimum threshold, e.g. for Span<T> types where P constructs a view over a possible larger backing object), see sequential Add or Set methods, and can implement Result in different ways, e.g. by sealing a mutable instance before handing it back, or by creating a view over some object. If a Dispose is present, it's guaranteed to be called after the method that used the params-constructed instance of P returns. That allows for a Return operation.'

As a boundary condition for the existing params T[] support, a builder could be constructed that simply performs new T[capacity] and has a Set methods that stores into the array. Dispose is absent, and Result returns the array. The functionality to construct the array is simply moved to a library rather than baking newarr and stelem instructions. While T[] support could just continue to use the existing inline code for efficiency reasons, params P for any type P could go through the pattern, allowing for tight control over allocation patterns inside the library.

Also, the types that can be used in a params argument list are simply derived from the types supported on Add or Set. For example, if a builder for Dictionary<K, V> were to have Add or Set overloads with KeyValuePair<K, V> and ValueTuple<K, V>, params constructing such a dictionary would support a comma-separated argument list of these types. E.g. MakeRequest(uri, HttpVerbs.Post, ("header1", "value1"), ("header2", "value2")). Whether or not that's desirable is a matter of library design, and it's not much different from supporting Add in collection initializers (which also picks up on extension methods). The difference here though is that the builder pattern lends itself well to construction of (immutable) objects that the caller doesn't have a reference to ("anonymous" if you like), while collection initializers don't have such a pattern that has an initial capacity or a phase where the result can be sealed.

Back to the initial example with Use, the builder type for a Span<T> could be a SpanBuilder<T> whose Create method uses the specified capacity to rent an array. The Add or Set methods fill in the array, and Result returns a Span<T> over the array. The call to Dispose returns the array to the pool. As such, the lifetime of the array is limited to the duration of the call. On the flip side, the generated code has a sequence of calls, so there's a question about perf (inlining?).

Note that arguments that are poured into a params parameter do not have to be uniformly typed, as pointed out above with the dictionary example. This brings me to the analogy for interpolated strings. The ValueStringBuilder type shown here is very similar to my builder type I (note that B is merely an entry point, and could totally be the same type with a static Create method on it), with Add being the equivalent of the various overloads of Append. If an interpolated string were to be treated as the moral equivalent of a heterogeneous argument list strings and interpolations (triples of an expression of type T, an optional alignment of type int, and an optional format string, represented as a struct a la FormattedValue<T>), the mechanism could be the same as params. The builder would simply see a chain of calls to Add(string) and Add<T>(FormattedValue<T>) and can construct into a rented char[] array, ultimately returning a string (or another Span<char>-y thing), and finally releasing the char[] array upon the call to Dispose. E.g.

$"A {b} c {d:e} f {g,h} i {j,k:l}"

would become

using (var __t = ValueStringBuilder.Create(8)) // NB: capacity isn't that useful here
{
  __t.Set(0, "A ");
  __t.Set(1, FormattedValue.Create(b));
  __t.Set(2, " c ");
  __t.Set(3, FormattedValue.Create(g, h));
  __t.Set(4, " i ");
  __t.Set(5, FormattedValue.Create(j, k, l));
}

Alternatively, the pattern could be Create with the format string (containing the holes) as the "capacity" and Add calls for the interpolated expressions (which then don't have to use some FormattedValue<T> that carries all information, given it's in the format string already). I guess the difference is between parsing an initially supplied format string and collecting all the substitutions, versus seeing a "stream" of strings and values coming in an append-only way as a sequence of calls. In the case of call sequences, we don't need to represent an array of variants.

Either way, I felt there's some anology between params and interpolated strings when looking at it from a builder pattern point of view, which allows for an open-ended set of types that can support in-place params construction. Also see https://github.com/bartdesmet/csharplang/blob/ExpressionTypes/proposals/params-builders.md for an earlier write-up on this train of thought.

@pgovind pgovind added the OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo label Mar 11, 2021
@pgovind pgovind closed this Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.