Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: ReadOnlySpan initialization from static data #5295

Closed
1 of 4 tasks
stephentoub opened this issue Oct 17, 2021 · 66 comments
Closed
1 of 4 tasks

[Proposal]: ReadOnlySpan initialization from static data #5295

stephentoub opened this issue Oct 17, 2021 · 66 comments
Assignees
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Oct 17, 2021

ReadOnlySpan initialization from static data

  • Proposed
  • Prototype: Not Started
  • Implementation: Not Started
  • Specification: Not Started

Summary

Provide a syntax for initializing a ReadOnlySpan<T> from constant data and with guaranteed zero allocation.

Motivation

dotnet/roslyn#24621 added compiler support that translates:

ReadOnlySpan<byte> data = new byte[] { const, values };

into non-allocating code that blits the binary data into the assembly data section and creates a span that points directly to that data. The same optimization is done for:

static readonly ReadOnlySpan<byte> Data => new byte[] { const, values };

We now rely on this all over the place in dotnet/runtime and elsewhere, as it provides a very efficient means for accessing a collection of constant values with minimal overhead and in a way the JIT is able to optimize consumption of very well.

However, there are multiple problems with this:

  1. The optimization only applies to byte-sized primitive T values, namely byte, sbyte, and bool. Specify any other type, and you fall off a massive cliff, as code you were hoping to be allocation-free now allocates a new array on every access.
  2. It's easy to accidentally fall off a similar cliff if at least one of the values turns out to be non-const (or becomes non-const), e.g. if a const value referred to in the initialization is changed elsewhere from a const to a static readonly.
  3. The syntax is confusing, as it looks like it's allocating, and PRs that optimize code from:
private static readonly byte[] s_bytes = new byte[] { ... };

to

private static ReadOnlySpan<byte> Bytes => new byte[] { ... };

are often met with confusion and misinformation.

Detailed design

Add dedicated syntax for creating spans without allocating that:

  1. Doesn't visually look like it's allocating (i.e. avoid use of 'new').
  2. Provides validation with errors if the data isn't provably constant.

As the following syntax fails to compile today:

ReadOnlySpan<byte> data = { 1, 2, 3 };

and

static ReadOnlySpan<byte> Data => { 1, 2, 3 };

they could be co-opted for this purpose.

Opening up this syntax via the removal of new T[] doesn't prevent the optimization from being applied by the compiler when new T[] is used, but it would guarantee a non-allocating implementation when the new T[] isn't used.

This could also tie in with params Span<T>: the syntax for the local variant could blit the data into the assembly if possible, or else fall back to the same implementation it would use for a params Span<T> method argument (assuming that params syntax itself doesn't itself fall back to heap allocation).

Implementation-wise, the compiler would use RVA statics whenever possible and fall back to static readonly arrays otherwise.

Drawbacks

TBD

Alternatives

TBD

Unresolved questions

Related to this, we have prototyped in both the runtime and the C# compiler support for extending this optimization to more than just byte-sized primitives. The difficulty with other primitives is endianness, and it can be addressed in a manner similar to array initialization: the runtime exposes a helper that either returns the original pointer, or sets up a cache of a copy of the data reversed based on the current endianness. There are also multiple fallback code generation options available for if that API isn't available. Such improvements to the compiler are related but separate from the improvements to the language syntax for this existing optimization.

Design meetings

cc: @jaredpar

@bernd5
Copy link
Contributor

bernd5 commented Oct 17, 2021

As specified in https://github.com/dotnet/csharplang/blob/3c8559f186d4c5df5d1299b0eaa4e139ae130ab6/spec/arrays.md#array-creation the syntax localOrField = { 1, 2, 3 }; is just a "shortcut" for localOrField = new int[]{ 1, 2, 3 };

So I think the optimization should be done regardless how the array initialization is written syntactically. Currently we have these forms AFAIK:

  • new Type[]{ value1, value2}
  • new[]{ value1, value2}
  • { value1, value2 }

To force this const-array-behaviour we could maybe introduce a new intrinsic Function like System.Array.CreateConstArray(1, 2, 3).

BTW: it would be nice if we could write:

static ReadOnlySpan<byte> Data => { 1, 2, 3 };

or

static ReadOnlySpan<byte> Data { return { 1, 2, 3 }; }

@stephentoub
Copy link
Member Author

So I think the optimization should be done regardless how the array initialization is written syntactically.

As I noted, the compiler would be free to do so... it already does so. But one of the key aspects here is the compiler preventing you from shooting yourself in the foot, guaranteeing that the syntax is non-allocating (at least beyond any initialization required, e.g. in the case of a big endian machine reading the assembly little endian data), and it would be a breaking change if the existing usage that allocates stopped compiling.

@jaredpar
Copy link
Member

Provides validation with warnings or errors if the expression would allocate

This feels like the best approach to me. Essentially we could standardize on = { ... } as being the non-allocating form. Then we could warn in any situation where it would cause an allocation:

  • The contents of the { ... } were non-const
  • The target of the expression was Span<byte>
  • The target of the expression was ReadOnlySpan<T> where T is not byte, bool or sbyte. This of course would change if we implemented the optimization for non-byte sized types.

The one part that still bothers me is arrays. That syntax works for arrays today but has none of the safe guards. I've thought a bit about adding a disabled warning here that could be enabled by developers who hit this a lot but I'm having trouble convincing myself that meets the bar. Given that this only works on fields and locals today, maybe just knowing it doesn't work on arrays is enough for developers.

BTW: it would be nice if we could write:
static ReadOnlySpan Data => { 1, 2, 3 };

That is asking for { ... } to be a more general expression where today it's limited to field and local declarations. It's not un-doable but a bit of a bigger ask.

If we took this then I'd feel a bit more strongly about finding a way to extend this warning to cases where it crossed into arrays. Take for example the code M({1, 2, 3}) which would be legal if we generalized this to expressions. Whether or not it allocates depends on overload resolution.

@Korporal
Copy link

Korporal commented Oct 18, 2021

I was just doing an experiment and stumbled upon what seems to be a bug:

image

Surely the 'buffer' is - implicitly - fixed in the sense that the GC is never going to relocate it if its inside a ref struct?

@stephentoub
Copy link
Member Author

#1792

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Oct 18, 2021

@Korporal see #1792

--

Beaten by @stephentoub :)

@JeroenBos
Copy link

Not exactly the same but related discussion: #955, more along the lines of

const ReadOnlySpan<byte> data = new byte[] { const, values };

@tannergooding
Copy link
Member

I particularly like the idea of putting const somewhere in the expression to indicate that the result must be considered "constant" (and therefore non-allocating, etc).

I feel like this could be more generally extended in the future to handle things like constexpr or unmanaged constants (the ability to define constants for blittable data types whose layout will never change, like Guid or Vector2/3/4, etc).

  • Unmanaged Constants #688 proposes unmanaged constant support utilizing the same functionality as constant arrays of primitives and likewise a helper method for fixing up endianness. The actual data declarations fully support such declarations being structured, so the only real language requirement is some constraint/expectation/contract that the layout of a type won't change or that such layout changes (e.g. packing differences) can be handled by the runtime

@jaredpar
Copy link
Member

I'd thought a bit about const here but discarded it for the following reasons.

Firstly if we allowed const here then we need to consider all the places in which const expressions are legal: attribute arguments, optional parameters, folding, etc ... For ReadOnlySpan<byte> I haven't dug deeply into all of these but at a high level I think there are answers for these questions. Several of them involve a significant amount of work compared to the initial suggestion of re-using an existing syntax for initialization.

One of the other suggestions we are looking at though is expanding this optimization to types which are not byte sized. For example ReadOnlySpan<char>, ReadOnlySpan<int>, etc ... That brings endianess concerns into the conversation and at the moment I don't think there are great answers for how we work those into our existing const semantics. That becomes a significant work item for us to dig into.

Secondly const only solves the issue for fields and locals. That doesn't cover all the cases where is desire to have the non-allocating syntax. Reading into the issue you can see the desire exists to essentially see this at the expression level hence const doesn't solve it.

That's not to say I don't want const ReadOnly<byte> to work in the future (i'd actually like it). I just don't think it's necessarily the best solution for the particular problem being stated here.

@svick
Copy link
Contributor

svick commented Oct 19, 2021

@jaredpar What if the new const expressions were completely separate from the existing constant expressions and const fields/variables? Syntactically, it could look something like:

ReadOnlySpan<byte> data = const byte[] { 1, 2, 3 };

(This may be what @tannergooding meant by "putting const somewhere in the expression".)

I think this is nice, because:

  • it more clearly expresses that it's the expression that's changing, not the field or variable,
  • it doesn't have to work for attributes etc.,
  • it works everywhere you can have expressions,
  • it's consistent with new and stackalloc: the keyword indicates where you're allocating from.

It's also problematic, because it would be very confusing if const meant two related, but very different things. Though that could be solved by using a different keyword (constalloc? constexpr? static?).

@jaredpar
Copy link
Member

@svick

What if the new const expressions were completely separate from the existing constant expressions and const fields/variables?

There isn't inherently anything wrong with this. But it's introducing a new expression form which is going to have a higher bar than re-using existing expression form in more places.

it doesn't have to work for attributes etc.,

It would be really weird if it didn't though. For example it would be weird if I could say const i = const 42 but not [My(const 1)]. I can't introducing this without introducing the complete picture.

It's also problematic, because it would be very confusing if const meant two related, but very different things.

This doesn't bother me. It's close enough to use for both concepts. Or said differently it's not different enough that I would warrant a new keyword for it.

@naine
Copy link

naine commented Oct 19, 2021

It would be really weird if it didn't though. For example it would be weird if I could say const i = const 42 but not [My(const 1)]. I can't introducing this without introducing the complete picture.

It doesn't need to be supported for all constant expressions, only array initializer expressions:

ReadOnlySpan<byte> span = new byte[] { 1, 2, 3 };        // creates span from System.Array (unless optimizable)
ReadOnlySpan<byte> span = stackalloc byte[] { 1, 2, 3 }; // creates span from stack array
ReadOnlySpan<byte> span = const byte[] { 1, 2, 3 };      // creates span from assembly data section

@jaredpar
Copy link
Member

It doesn't need to be supported for all constant expressions, only array initializer expressions:

That seems very counterintuitive. If const is meant to explicitly identify expressions that are const why should it be limited to a subset of const expressions?

Yes I agree that const 42 (or any other literal) is a bit silly but what about const A.B? Lacking const that could be a side effecting property but with const prefix I've guaranteed I'm accessing a constant here. Or even const A + B as another example.

I can't see introducing const as a way to identify const array expressions but exclude it from all the other expression that are constant.

@CyrusNajmabadi
Copy link
Member

Championing this. I too fell into:

The syntax is confusing, as it looks like it's allocating, and PRs that optimize code from ... are often met with confusion and misinformation.

It would be nice to have a form that was less magical about how it was working with the compiler to get this result.

@CyrusNajmabadi
Copy link
Member

As the following syntax fails to compile today:

Yup. And, afaict, within roslyn, there woudl be no complexity to parse this at all. This would already fix directly into our syntax model. So this would just be something on the semantic side to enable. I don't know the compiler impl here well, but i have a feeling it could be done with minimal effort. Specifically:

byte[] b = { ... } is already supported, and so the compiler already effectively has to bind and translate that form to byte[] b = new byte[the_length] { ... } internally. So this would just be:

  1. do teh same for if you have { ... } on the RHS of a ReadOnlySpan<X> X = { ... } or as the expr in a ReadOnlySpan<X> X => expr property body. The rest of binding/emit should then take over and give teh same optimization as today.
  2. Relax the rule on ref-structs as fields to allow them as static-readonly fields. Though need runtime/gc people to state if that will be a problem or not. Conceptually for C# i don't see any problems there.
  3. Ensure that binding can understand that if you have ReadOnlySpan<X> x = { ... } what the right type of the array initializer is. For example, if you have ReadOnlySpan<byte> X = { 1, 2, 3 } we want this to be viewed as ReadOnlySpan<byte> X = new byte[] { 1, 2, 3 } not ReadOnlySpan<byte> X = new int[] { 1, 2, 3 }. Not sure if that's trivial, or if that may require some hackery to understand these special types. Given the optimization around arrays/ROS already in the compiler today, my guess is that there are already these special type hacks, so this would be nbd to encode into the language and support here.

@CyrusNajmabadi
Copy link
Member

@jaredpar I'm interested in taking a stab at this once we have space in the schedule. Let's chat about when/if you think this is possible to slot in. Definitely low pri, but also seems nice to have, hopefully cheap, and well received by the community here.

@jkotas
Copy link
Member

jkotas commented May 11, 2022

Relax the rule on ref-structs as fields to allow them as static-readonly fields. Though need runtime/gc people to state if that will be a problem or not.

Yes, that's a problem for the runtime. One of the primary reason for ref-structs to exist is that they cannot appear on the GC heap.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented May 11, 2022

@jkotas Can we relax this in some fashion? Conceptually in this case there should be no problem that i can see (def correct me if i'm wrong). However, i'm not sure if this would be just something about ROS, or something that could extend to ref-structs in general. And, to be clear, i solely mean the use of them in static-readonly locations, nowhere else. Thanks!

--

Note: i suppose a possible (though unpalatable to me) possibility would be to allow someone to write:

static readonly ReadOnlySpan<byte> X = { 1, 2, 3 };

and have that actually translate to a property under the covers. conceptually that would then work as far as the runtime and everything else was concerned. But it would def be a bit wonky as that syntax really implies this is a field through and through.

@jkotas
Copy link
Member

jkotas commented May 11, 2022

Can we relax this in some fashion?

static fields are stored on the GC heap just like anything else.

This is hard trade-off for the runtime to make. If we relax this, we will end up with increased GC latencies that translate to worse P99s that is something that top tier services care about a lot.

@CyrusNajmabadi
Copy link
Member

Can you clarify where hte increased GC latency comes from? Conceptually, a user can (and already did) write:

private static readonly T[] static_array = { ... }

Wrapping that with ROS to make the array non-mutable doesn't seem like it should have a GC impact.

@jkotas
Copy link
Member

jkotas commented May 11, 2022

The byref in Span/ReadOnlySpan are expensive for the GC to walk. The GC has to find the containing object for them that is an expensive operation.

@CyrusNajmabadi
Copy link
Member

The GC has to find the containing object for them

Why does the GC have to find the containing object for them?

@jkotas
Copy link
Member

jkotas commented May 11, 2022

So that it can mark the object as reachable.

@CyrusNajmabadi
Copy link
Member

Why does it need to mark the containing object as reachable? (aside, this back and forth seems to be really inefficient). Would it be possible for you to break down the entirety of what's going on here so each message doesn't need a further 'why' to get a sense of the very next step in the algorithm? Thanks.

@jkotas
Copy link
Member

jkotas commented May 11, 2022

I feel that I am trying to explain the basic internal working of the GC. There are books and talk series dedicated to explaining the full chain of why things are done in certain way and the trade-offs involved in the GC design. I agree that it is not useful to replay those here.

It would be certainly technically possible (though a ton of work) to allow byrefs on heap. The problem are the performance trade-offs that would come from that decision. We are not convinced that these trade-offs would be an overall win.

The short story is that marking byrefs is expensive. Marking byrefs has to find a containing object. It is ~100x more expensive than a simple pointer dereference that is used to mark object references. Limiting byrefs to active stack only makes this cost manageable (the typical code has only so many byrefs on live stacks) and keeps the core marking algorithm simple.

There are many possible trade-offs in this area. For example, the classic Java VMs do not have byrefs like .NET. Instead, they explicitly store and pass around the containing object of the byref. It means that the code has slower throughput, but the GC is simpler and can do less work it does not need to support lookup of containing object from arbitrary pointer.

@CyrusNajmabadi
Copy link
Member

I feel that I am trying to explain the basic internal working of the GC.

This doesn't feel basic to me. It's unclear to me why the containing type is relevant at all for a static ROS field. Perhaps you can go the other direction and show an example of why it would matter?

@jkotas
Copy link
Member

jkotas commented May 11, 2022

Consider static ReadOnlySpan<byte> s_field = new byte[] { 1, 2, 3 };. Something needs to keep the byte[] alive.

@stephentoub
Copy link
Member Author

I mean it's an implementation detail of the compiler today. No C#, CLR, ECMA, etc. specification would be violated if that were changed. No formal process beyond normal PR review was used to vet the IL the compiler emits for this as part of the PR that implemented the optimization. Etc.

@Sergio0694
Copy link

So, we've been chatting about several issues regarding constant data these past few days with @tannergooding, @AaronRobinsonMSFT, @jkoritzinsky and other folks. There were two main aspects that have been brought up:

  1. Relying on the data being in the PE section, and therefore pinned (or, relying on the data being pinned at all).
  2. Returning data in a ReadOnlyMemory<T>, and concerns related to assembly unloading.

Point 2 was brought up in dotnet/runtime#71268 and I'd consider that as being pretty much resolved, as you can safely do this by implementing a custom MemoryManager<T> wrapping some delegate accessing the span (or equivalent logic). That'd implicitly root the containing assembly, wouldn't rely on the span data pinned, and it would still let you avoid using ToArray() on the span, so let's ignore that one here (just mentioned it for additional context, as it was related).

Point 1 is still an open issue, and we've been brainstorming ideas on how to potentially resolve this. For context, the spans being lowered by Roslyn into the PE section and therefore being pinned is something that is used in production. For instance, Tanner has thousands of uses of this (I mean Unsafe.AsPointer on an RVA span) throughout TerraFX, I'm using the same in my ComputeSharp library which is also used in production by several folks, etc. Since this is, as you said, not really detailed anywhere, it's essentially undefined behavior, and I think it would be nice to come up with a proper way to address this.

Note: of course folks are currently testing this (eg. Tanner has plenty of unit tests for all supported TFMs in his CI, I'm doing the same for my projects, etc), but this is still not ideal at all, plus it would not help catch scenarios where a customer might bump their TFM without updating the dependency at the same time, and that could just break completely.

There's two different ways I could see us tackling this:

  • Language: we could make constant data using the new syntax be specced to always be lowered to a span in the PE data section by Roslyn, making this well defined behavior. Caveat: this of course would not work for spans that require endianness fixups, as in that case the runtime might have to allocate an array and copy the adjusted data to it.
  • Runtime: Tanner proposed the idea of using some attribute (eg. [FixedAddressValueType], or a new one) to annotate a property returning a constant span, that would instruct the runtime to ensure the span data is pinned (ie. either doing nothing if the data is already in the PE data section, or otherwise allocating the array in the POH).

To me it seems we could do both: making the new { ... } const syntax be always lowered to the PE data section if T allows it (ie. for byte, sbyte), and then separately from that, propose some new attribute to allow developers to opt-in into getting the data being guaranteed to be pinned by the runtime in all cases, regardless of the type of T.

Wanted to get some additional thoughts before opening a proposal, especially because the one about making { ... } be guaranteed to be in the PE data section for byte and sbyte would be an update to this spec, so I guess it should just belong to this proposal. As the proposal says, there's plenty of code in the runtime too that relies on this be the case, so it would seem fine to me to decide to make this a properly specced behavior and no longer an implementation detail? 🙂

@jkotas
Copy link
Member

jkotas commented Jun 28, 2022

Runtime: Tanner proposed the idea of using some attribute (eg. [FixedAddressValueType], or a new one) to annotate a property returning a constant span, that would instruct the runtime to ensure the span data is pinned (ie. either doing nothing if the data is already in the PE data section, or otherwise allocating the array in the POH)

This sounds like overengineering to me. It should be fine to say in
https://github.com/dotnet/runtime/blob/main/docs/design/specs/Ecma-335-Augments.md that embedded data (II.16.3 Embedding data in a PE file) have fixed address at runtime. BTW: I am not able to find anything the ECMA spec that says locals have fixed address at runtime... .

@MichalPetryka
Copy link

Well locals do not always have a fixed address in runtime, for example in async methods.

@Sergio0694
Copy link

Sergio0694 commented Jun 28, 2022

"It should be fine to say in
https://github.com/dotnet/runtime/blob/main/docs/design/specs/Ecma-335-Augments.md that embedded data (II.16.3 Embedding data in a PE file) have fixed address at runtime"

That seems perfect to me, absolutely. No new attributes needed, and for existing code it'd mostly just be making what is currently just an implementation detail, well defined behavior. All I care about here really is just making this something that is technically correct to rely on in production scenarios, so if it was possible to just add this to the spec, then that'd be just great 😄

Note: should we also get C# to then make constant data be lowered into the PE data section be well defined behavior?
Because otherwise even with PE data being fixed in the spec, the language wouldn't actually have to lower data that way, no?

@jkotas
Copy link
Member

jkotas commented Jun 28, 2022

Well locals do not always have a fixed address in runtime, for example in async methods.

They do in IL that ECMA-335 describes. My comment was referring to IL / ECMA-335 notion of a local that is different from C# notion of a local.

@tannergooding
Copy link
Member

tannergooding commented Jun 28, 2022

Tanner proposed the idea of using some attribute (eg. [FixedAddressValueType], or a new one) to annotate a property returning a constant span, that would instruct the runtime to ensure the span data is pinned (ie. either doing nothing if the data is already in the PE data section, or otherwise allocating the array in the POH).

Noting that this was a suggestion for a possible way the runtime could provide support and came with the annotation that FixedAddressValueType isn't really recommended for use today (things like AllocTypeAssociatedMemory and NativeMemory.Alloc are better) and that it has various code-gen inefficiencies and other issues.

The runtime simply guaranteeing that "data declarations" (which is the ECMA-335 terminology for this type of data) must be pinned even for AOT scenarios/etc would also satisfy most of this.

The runtime guaranteeing that the RuntimeHelpers APIs that do the endianness fixups allocate "pinned" data (even if GC tracked) would cover most of the rest.

The remaining issue would then be what Roslyn wants to guarantee. I, personally, don't see a lot of benefit to caching in a static readonly T[] as that's strictly "more expensive" and going to result in the same or worse "bloat" to the image size. Users who want that behavior can just create their own static readonly T[] Name = ....

But, if that's something Roslyn wants to allow it would be up to discussing with them and the runtime team to provide a good alternative. One such way would be an attribute that they use to understand how it should be emitted.

@stephentoub
Copy link
Member Author

I, personally, don't see a lot of benefit to caching in a static readonly T[] as that's strictly "more expensive" and going to result in the same or worse "bloat" to the image size.

I'm not sure what you mean as it being "strictly more expensive". If a developer writes:

public ReadOnlySpan<int> Data => { 1, 2, 3, 4, 5 };

having the compiler lower that to caching it in a field is in no way strictly more expensive than allocating a new array on each invocation. It is more expensive than using CreateSpan, but CreateSpan may not always exist. A developer should be able to write the above code and get reasonable performance regardless of what platform is being targeted, e.g. we want to be able to write the above in a file that's built in to an assembly targeting .NET 8 and netstandard2.1. Also enabling such lowering removes complexity... regardless of the type in question, a developer could then always write the above, whether the data has no endianness issues, whether CreateSpan is available, and whether CreateSpan even works with the type in question... the average developer no longer needs to know or think as much about the costs involved, whether they're going to fall off a cliff, etc... the syntax "just works" and does the best possible thing.

@Sergio0694
Copy link

As a starting point, would it be reasonable to have the C# spec be updated to indicate that when { ... } is used on a type that doesn't have andianness concerns (ie. byte and sbyte), that would be guaranteed to be lowered as a data declaration?

I feel like this might be a first bit that could be guaranteed regardless of what we decided for the other scenarios 🤔
At the very least this would already cover raw binary blobs and UTF8 literals.

@tannergooding
Copy link
Member

tannergooding commented Jun 28, 2022

having the compiler lower that to caching it in a field is in no way strictly more expensive than allocating a new array on each invocation.

Caching { 1, 2, 3, 4, 5 } as some private static readonly int[] _PrivateData = new int[] { 1, 2, 3, 4, 5 }; public ReadOnlySpan<int> Data => _PrivateData; is more "expensive" than simply emitting it as the CreateSpan call directly to the embedded data. It requires more overall metadata to support, but also an allocation and requires a static constructor to run at some point leading to worse codegen and handling in NAOT scenarios where the initialization logic can't be removed.

It is more expensive than using CreateSpan, but CreateSpan may not always exist. A developer should be able to write the above code and get reasonable performance regardless of what platform is being targeted, e.g. we want to be able to write the above in a file that's built in to an assembly targeting .NET 8 and netstandard2.1.

Such a new feature is, strictly speaking .NET 8/C# 12+ exclusive at this point and using such features on .NET Standard 2.1 will be officially unsupported. There are plenty of features that do not work downstream and I don't think its worth making such a feature be "worse" just to support such a scenario. Devs who need to target the other have alternatives, such as continuing to use the public static readonly int[] Data = new int[] { 1, 2, 3, 4, 5 }; syntax -or- caching the data themselves and returning it through the public ReadOnlySpan<int> Data { get; } property.

the average developer no longer needs to know or think as much about the costs involved

The average developer won't be impacted by such a guarantee because the entire feature will only be available if they target .NET 8/C# 12+ and in such a scenario they get the feature with any guarantees we want to put in place.

@stephentoub
Copy link
Member Author

stephentoub commented Jun 28, 2022

is more "expensive" than simply emitting it as the CreateSpan call directly to the embedded data

No one is arguing against that.

I don't think its worth making such a feature be "worse" just to support such a scenario.

It wouldn't be worse. If CreateSpan is available and usable for a given type, it would use it. If it's not, it would use a caching field. I don't understand why that's contentious. It enables this language syntax for every type, regardless of whether it's able to be optimized even further on the target platform. It enables us to use such language syntax in files/libraries that multitarget. It removes complexity while still providing the the desired perf when possible and without falling off a cliff when not.

@stephentoub
Copy link
Member Author

stephentoub commented Jun 28, 2022

The average developer won't be impacted by such a guarantee because the entire feature will only be available if they target .NET 8/C# 12+ and in such a scenario they get the feature with any guarantees we want to put in place.

Yes, they will, as CreateSpan only works with a subset of types. And average devs work in multitargeted projects (even if they didn't configure the build) and will for years to come. And even for more experienced developers, we shouldn't force them to ifdef all use of this syntax when multitargeting, e.g. roslyn itself should be able to use the syntax.

@jaredpar
Copy link
Member

If CreateSpan is available and usable for a given type, it would use it. If it's not, it would use a caching field. I don't understand why that's contentious.

This pattern of "use X if available otherwise fall back to Y" is pretty common in the C# compiler. It allows to generate better code when possible but keep features working, albeit in a less efficient way, when they're not. The decisions here are very much inline with how the compiler tends to approach these problems (for pretty much all the usability reasons @stephentoub outlined above).

The use of the data as a fixed item feels pretty domain specific. It's not a general purpose usage. As such my recommendation would be such domains should have an analyzer to guarantee they're only using the ROS<T> as fixed when indeed it truly is lowered down to static data in the PE file.

@Sergio0694
Copy link

"It wouldn't be worse"

I guess one could make the argument that the fact it could not guarantee the underlying data was pinned would make it objectively worse, as it would give advanced users that would want to leverage that a harder time. That is, they'd have to either just essentially YOLO it, which is what TerraFX, ComputeSharp etc. are currently doing (technically relying on an implementation detail), or come up with some other (likely less efficient) alternative.

Would it be doable to have C# guarantee the lowering to the data section when the .NET 8+ TFM is used? Or, when CreateSpan is available I guess, if that alone is enough to support this scenario. And then the runtime could do the rest of the work.

"have an analyzer to guarantee they're only using the ROS as fixed"

That would be completely non viable in eg. TerraFX.Interop.Windows unfortunatley (not to mention it'd be much slower).

Related - bumping in case it was missed, would this be doable at first?

"[...] would it be reasonable to have the C# spec be updated to indicate that when { ... } is used on a type that doesn't have andianness concerns (ie. byte and sbyte), that would be guaranteed to be lowered as a data declaration?"

@tannergooding
Copy link
Member

I don't understand why that's contentious.

It's somewhat contentious because for "other types" there is an existing syntax that works and that is fairly well understood on what it emits and what the various implications are for perf-oriented scenarios. This issue is proposing a syntax that helps enforce that the data doesn't allocate on every invocation and since there are various places in the BCL and throughout the broader ecosystem that are currently relying on certain additional implications for perf reasons its only natural that there are asks to help codify the implications more concretely.

However, as I indicated above, the ask here is really just that there be "some way" to help guarantee some of the implications that high perf scenarios are actively relying on. This could be that there is some syntax like const { ..., ..., ... } that only works with other const allowed data (primitives today). It could be that there is some attribute that the language/runtime agree on the handling around to help ensure the allocation (if one exists) is on the POH (pinned-object heap), etc.

Its just requesting that an important detail devs and core code are relying on have some way of being codified.

And average devs work in multitargeted projects (even if they didn't configure the build) and will for years to come.

Yes, but such projects won't have access to ReadOnlySpan<T> Name => { cns, cns, cns } syntax on the downlevel TFMs. They only get access if they explicitly change LangVersion, which then puts them into an unsupported scenario.

People will certainly do it regardless, but there are still many language features are only enabled on newer runtimes and so it doesn't seem "that odd" to have another one where there additional guarantees that could be made.

@tannergooding
Copy link
Member

tannergooding commented Jun 28, 2022

That would be completely non viable in eg. TerraFX.Interop.Windows unfortunatley (not to mention it'd be much slower).

Just noting I'm not personally concerned with my own libraries here. I'm only targeting .NET LTS + STS and can update it all to be validated/correct, even in the face of undefined or implementation defined behavior.

I'm just trying to raise some of the concerns and asks for codification that exist more broadly as I do think that having some way to more broadly guarantee the data is "definitively pinned" (even if runtime specific) would be net goodness. This doesn't have to come in the form of unique syntax, it could come in the form of a spec guarantee, an attribute, or something else.

@stephentoub
Copy link
Member Author

stephentoub commented Jun 28, 2022

Yes, but such projects won't have access to ReadOnlySpan Name => { cns, cns, cns } syntax on the downlevel TFMs

Yes, they will. That's not how C# works. If you believe it's truly unsupported then please submit the PRs to runtime, Azure SDK, Roslyn, aspnetcore, and so on to stop using newer LangVers when targeting netstandard and the like. I'm sorry, I find this line of argument very frustrating. Very, very, very few devs have the same scenarios being microoptimized for here; many more would benefit from simplified syntax that avoids allocation. You get 95+% of the perf benefits by avoiding the allocation via caching it; we should enable the simple syntax that "just works" and does the best possible thing based on the target, rather than failing to compile or falling off a cliff, in either case increasing learning curve and complexity. If we also want to make additional spec guarantees for some types/uses, that's fine.

@jkoritzinsky
Copy link
Member

Yes, but such projects won't have access to ReadOnlySpan Name => { cns, cns, cns } syntax on the downlevel TFMs

Yes, they will. That's not how C# works. If you believe it's truly unsupported then please submit the PRs to runtime, Azure SDK, Roslyn, aspnetcore, and so on to stop using newer LangVers when targeting netstandard and the like.

The docs explicitly state that new language versions are not supported on older target frameworks. We don't block them as we use them ourselves on downlevel TFMs (barring features that rely on runtime support), but we explicitly tell customers that they are not supported.

Having downlevel TFM support with the new LangVersion is reasonable, but we should consider the cost of implementation with the fact that it's an explicitly not supported scenario.

https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/configure-language-version

C# 11 is supported only on .NET 7 and newer versions. C# 10 is supported only on .NET 6 and newer versions. C# 9 is supported only on .NET 5 and newer versions. C# 8.0 is supported only on .NET Core 3.x and newer versions.

@stephentoub
Copy link
Member Author

We don't block them as we use them ourselves on downlevel TFMs

Yes, "everyone" does. We go out of our way to keep them working unless there's a strong reason we can't. Adding a caching strategy that we'd want to be available even on the latest platform in order to support types that don't work with CreateSpan is not a significant burden.

@jaredpar
Copy link
Member

However, as I indicated above, the ask here is really just that there be "some way" to help guarantee some of the implications that high perf scenarios are actively relying on.

The compiler cannot be the bottleneck for relying on every detail of every aspect of C# programming. This is a large reason for why we added analyzers. It enables developers to add guarantees specific for their domain that don't fit into what the language either desires or has time to formally guarantee.

"have an analyzer to guarantee they're only using the ROS as fixed"

That would be completely non viable in eg. TerraFX.Interop.Windows unfortunatley (not to mention it'd be much slower).

Analyzers are used in a large number of projects to guarantee a huge variety of scenarios. Categorically declaring analyzers to be "completely non viable" with out detail does not feel like good response here.

@tannergooding
Copy link
Member

tannergooding commented Jun 28, 2022

I agree and an analyzer is often a good choice. But, there is also only so much that an analyzer can check and or guarantee.

Would it be possible for Roslyn (not C#) to documented something like ReadOnlySpan<T> x => { cns, cns, cns }; (for C# vX+) will always emit a "data declaration" for byte/sbyte and that it will do the same for other primitive types when CreateSpan is available?

That would allow devs to then independently rely on the implementation detail and base their analyzers on something "sound".

It would also still leave the implementation fairly flexible for non-constant data and for user-defined types.

@Sergio0694
Copy link

"Categorically declaring analyzers to be "completely non viable" with out detail does not feel like good response here."

Sorry, I was on my phone and didn't elaborate properly on that, let me try again 😅
Tanner pretty much summarized that already, what I meant is that an analyzer wouldn't be viable there for two reasons:

  • An analyzer is only able to give developers suggestions on eg. what code to write or not to write, potential mistakes they made, etc. The ask here is about having a given guarantee from either the language spec or Roslyn, that a given pattern (in this case { ... } for byte and sbyte) be lowered to a data declaration and therefore guaranteed to be pinned. This is not something that an analyzer could help with, as it'd just be something Roslyn handles when lowering, is what I meant.
  • Even assuming that you wanted to go the other way, as in, adding an analyzer that warned whenever eg. Unsafe.AsPointer was used on an RVA span to nudge developers to not rely on those spans being data declarations and pinned, then that'd still not be viable, as it would (1) cause a significant performance regression (eg. getting a GUID from a type is supposed to be virtually free), and (2) it would make writing code much more verbose, as you'd be forced to introduce a fixed block every single time you needed to access a span with data. This is something that's done eg. incredibly often when using Win32 APIs, eg. when passing a CLSID and RIID to activation APIs, when doing QueryInterface calls, etc. You wouldn't want to have to introduce (potentially multiple) fixed blocks every single time you had to pass a Guid* to a method. That code is already complicated enough 😄

I guess the point here is, as we mentioned: this optimization is something that Roslyn is already doing, that many customers (and the runtime itself) have taken a dependency on, that is (as far as I can tell) not really likely to ever be changed (because why should it), and that honestly just makes sense in the first place (if you have an RVA span of byte, why would you ever not want to lower it to a data segment?). So like Tanner said, we're really just asking if this could be included in some spec update so that the data being pinned would be a guarantee and no longer technically undefined behavior, is all 🙂

@333fred
Copy link
Member

333fred commented Oct 9, 2023

Discussed in https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-10-09.md#readonlyspan-initialization-from-static-data. As this has mostly been subsumed by C# 12 features, we are closing this out.

@333fred 333fred closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2023
@PavelCibulka
Copy link

PavelCibulka commented Nov 15, 2024

This already works inside functions. for exaple:

static float x1(int a){
            a = a & 3;
            ReadOnlySpan<float> numbers = [1f, 0.7f, 0.3f, 0.5f];
            return numbers[a];
        } 
static float x2(int a){
            a = a & 3;
            ReadOnlySpan<float> numbers = [1f, 0.7f, 0.3f, 0.5f];
            return numbers[a];
}

Get compiled into:

Program:x1(int):float (FullOpts):
       and      edi, 3
       mov      eax, edi
       mov      rcx, 0x71F73122CC20      ; static handle
       vmovss   xmm0, dword ptr [rcx+4*rax]
       ret      

Program:x2(int):float (FullOpts):
       and      edi, 3
       mov      eax, edi
       mov      rcx, 0x71F73122CC20      ; static handle
       vmovss   xmm0, dword ptr [rcx+4*rax]
       ret  

ReadOnlySpan is practically const. Both functions use same memory address. But it is not very nice for programer. It would be better if we could do obvious thing.

const ReadOnlySpan<float> numbers = [1f, 0.7f, 0.3f, 0.5f];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests