-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal]: ReadOnlySpan initialization from static data #5295
Comments
As specified in https://github.com/dotnet/csharplang/blob/3c8559f186d4c5df5d1299b0eaa4e139ae130ab6/spec/arrays.md#array-creation the syntax So I think the optimization should be done regardless how the array initialization is written syntactically. Currently we have these forms AFAIK:
To force this const-array-behaviour we could maybe introduce a new intrinsic Function like BTW: it would be nice if we could write: static ReadOnlySpan<byte> Data => { 1, 2, 3 }; or static ReadOnlySpan<byte> Data { return { 1, 2, 3 }; } |
As I noted, the compiler would be free to do so... it already does so. But one of the key aspects here is the compiler preventing you from shooting yourself in the foot, guaranteeing that the syntax is non-allocating (at least beyond any initialization required, e.g. in the case of a big endian machine reading the assembly little endian data), and it would be a breaking change if the existing usage that allocates stopped compiling. |
This feels like the best approach to me. Essentially we could standardize on
The one part that still bothers me is arrays. That syntax works for arrays today but has none of the safe guards. I've thought a bit about adding a disabled warning here that could be enabled by developers who hit this a lot but I'm having trouble convincing myself that meets the bar. Given that this only works on fields and locals today, maybe just knowing it doesn't work on arrays is enough for developers.
That is asking for If we took this then I'd feel a bit more strongly about finding a way to extend this warning to cases where it crossed into arrays. Take for example the code |
-- Beaten by @stephentoub :) |
Not exactly the same but related discussion: #955, more along the lines of const ReadOnlySpan<byte> data = new byte[] { const, values }; |
I particularly like the idea of putting I feel like this could be more generally extended in the future to handle things like
|
I'd thought a bit about Firstly if we allowed One of the other suggestions we are looking at though is expanding this optimization to types which are not Secondly That's not to say I don't want |
@jaredpar What if the new ReadOnlySpan<byte> data = const byte[] { 1, 2, 3 }; (This may be what @tannergooding meant by "putting I think this is nice, because:
It's also problematic, because it would be very confusing if |
There isn't inherently anything wrong with this. But it's introducing a new expression form which is going to have a higher bar than re-using existing expression form in more places.
It would be really weird if it didn't though. For example it would be weird if I could say
This doesn't bother me. It's close enough to use for both concepts. Or said differently it's not different enough that I would warrant a new keyword for it. |
It doesn't need to be supported for all constant expressions, only array initializer expressions: ReadOnlySpan<byte> span = new byte[] { 1, 2, 3 }; // creates span from System.Array (unless optimizable)
ReadOnlySpan<byte> span = stackalloc byte[] { 1, 2, 3 }; // creates span from stack array
ReadOnlySpan<byte> span = const byte[] { 1, 2, 3 }; // creates span from assembly data section |
That seems very counterintuitive. If Yes I agree that I can't see introducing |
Championing this. I too fell into:
It would be nice to have a form that was less magical about how it was working with the compiler to get this result. |
Yup. And, afaict, within roslyn, there woudl be no complexity to parse this at all. This would already fix directly into our syntax model. So this would just be something on the semantic side to enable. I don't know the compiler impl here well, but i have a feeling it could be done with minimal effort. Specifically:
|
@jaredpar I'm interested in taking a stab at this once we have space in the schedule. Let's chat about when/if you think this is possible to slot in. Definitely low pri, but also seems nice to have, hopefully cheap, and well received by the community here. |
Yes, that's a problem for the runtime. One of the primary reason for ref-structs to exist is that they cannot appear on the GC heap. |
@jkotas Can we relax this in some fashion? Conceptually in this case there should be no problem that i can see (def correct me if i'm wrong). However, i'm not sure if this would be just something about ROS, or something that could extend to ref-structs in general. And, to be clear, i solely mean the use of them in static-readonly locations, nowhere else. Thanks! -- Note: i suppose a possible (though unpalatable to me) possibility would be to allow someone to write: static readonly ReadOnlySpan<byte> X = { 1, 2, 3 }; and have that actually translate to a property under the covers. conceptually that would then work as far as the runtime and everything else was concerned. But it would def be a bit wonky as that syntax really implies this is a field through and through. |
This is hard trade-off for the runtime to make. If we relax this, we will end up with increased GC latencies that translate to worse P99s that is something that top tier services care about a lot. |
Can you clarify where hte increased GC latency comes from? Conceptually, a user can (and already did) write:
Wrapping that with ROS to make the array non-mutable doesn't seem like it should have a GC impact. |
The |
Why does the GC have to find the containing object for them? |
So that it can mark the object as reachable. |
Why does it need to mark the containing object as reachable? (aside, this back and forth seems to be really inefficient). Would it be possible for you to break down the entirety of what's going on here so each message doesn't need a further 'why' to get a sense of the very next step in the algorithm? Thanks. |
I feel that I am trying to explain the basic internal working of the GC. There are books and talk series dedicated to explaining the full chain of why things are done in certain way and the trade-offs involved in the GC design. I agree that it is not useful to replay those here. It would be certainly technically possible (though a ton of work) to allow byrefs on heap. The problem are the performance trade-offs that would come from that decision. We are not convinced that these trade-offs would be an overall win. The short story is that marking byrefs is expensive. Marking byrefs has to find a containing object. It is ~100x more expensive than a simple pointer dereference that is used to mark object references. Limiting byrefs to active stack only makes this cost manageable (the typical code has only so many byrefs on live stacks) and keeps the core marking algorithm simple. There are many possible trade-offs in this area. For example, the classic Java VMs do not have byrefs like .NET. Instead, they explicitly store and pass around the containing object of the byref. It means that the code has slower throughput, but the GC is simpler and can do less work it does not need to support lookup of containing object from arbitrary pointer. |
This doesn't feel basic to me. It's unclear to me why the containing type is relevant at all for a static ROS field. Perhaps you can go the other direction and show an example of why it would matter? |
Consider |
I mean it's an implementation detail of the compiler today. No C#, CLR, ECMA, etc. specification would be violated if that were changed. No formal process beyond normal PR review was used to vet the IL the compiler emits for this as part of the PR that implemented the optimization. Etc. |
So, we've been chatting about several issues regarding constant data these past few days with @tannergooding, @AaronRobinsonMSFT, @jkoritzinsky and other folks. There were two main aspects that have been brought up:
Point 2 was brought up in dotnet/runtime#71268 and I'd consider that as being pretty much resolved, as you can safely do this by implementing a custom Point 1 is still an open issue, and we've been brainstorming ideas on how to potentially resolve this. For context, the spans being lowered by Roslyn into the PE section and therefore being pinned is something that is used in production. For instance, Tanner has thousands of uses of this (I mean Note: of course folks are currently testing this (eg. Tanner has plenty of unit tests for all supported TFMs in his CI, I'm doing the same for my projects, etc), but this is still not ideal at all, plus it would not help catch scenarios where a customer might bump their TFM without updating the dependency at the same time, and that could just break completely. There's two different ways I could see us tackling this:
To me it seems we could do both: making the new Wanted to get some additional thoughts before opening a proposal, especially because the one about making |
This sounds like overengineering to me. It should be fine to say in |
Well locals do not always have a fixed address in runtime, for example in async methods. |
That seems perfect to me, absolutely. No new attributes needed, and for existing code it'd mostly just be making what is currently just an implementation detail, well defined behavior. All I care about here really is just making this something that is technically correct to rely on in production scenarios, so if it was possible to just add this to the spec, then that'd be just great 😄 Note: should we also get C# to then make constant data be lowered into the PE data section be well defined behavior? |
They do in IL that ECMA-335 describes. My comment was referring to IL / ECMA-335 notion of a local that is different from C# notion of a local. |
Noting that this was a suggestion for a possible way the runtime could provide support and came with the annotation that The runtime simply guaranteeing that "data declarations" (which is the ECMA-335 terminology for this type of data) must be pinned even for AOT scenarios/etc would also satisfy most of this. The runtime guaranteeing that the The remaining issue would then be what Roslyn wants to guarantee. I, personally, don't see a lot of benefit to caching in a But, if that's something Roslyn wants to allow it would be up to discussing with them and the runtime team to provide a good alternative. One such way would be an attribute that they use to understand how it should be emitted. |
I'm not sure what you mean as it being "strictly more expensive". If a developer writes: public ReadOnlySpan<int> Data => { 1, 2, 3, 4, 5 }; having the compiler lower that to caching it in a field is in no way strictly more expensive than allocating a new array on each invocation. It is more expensive than using CreateSpan, but CreateSpan may not always exist. A developer should be able to write the above code and get reasonable performance regardless of what platform is being targeted, e.g. we want to be able to write the above in a file that's built in to an assembly targeting .NET 8 and netstandard2.1. Also enabling such lowering removes complexity... regardless of the type in question, a developer could then always write the above, whether the data has no endianness issues, whether CreateSpan is available, and whether CreateSpan even works with the type in question... the average developer no longer needs to know or think as much about the costs involved, whether they're going to fall off a cliff, etc... the syntax "just works" and does the best possible thing. |
As a starting point, would it be reasonable to have the C# spec be updated to indicate that when I feel like this might be a first bit that could be guaranteed regardless of what we decided for the other scenarios 🤔 |
Caching
Such a new feature is, strictly speaking .NET 8/C# 12+ exclusive at this point and using such features on .NET Standard 2.1 will be officially unsupported. There are plenty of features that do not work downstream and I don't think its worth making such a feature be "worse" just to support such a scenario. Devs who need to target the other have alternatives, such as continuing to use the
The average developer won't be impacted by such a guarantee because the entire feature will only be available if they target .NET 8/C# 12+ and in such a scenario they get the feature with any guarantees we want to put in place. |
No one is arguing against that.
It wouldn't be worse. If CreateSpan is available and usable for a given type, it would use it. If it's not, it would use a caching field. I don't understand why that's contentious. It enables this language syntax for every type, regardless of whether it's able to be optimized even further on the target platform. It enables us to use such language syntax in files/libraries that multitarget. It removes complexity while still providing the the desired perf when possible and without falling off a cliff when not. |
Yes, they will, as CreateSpan only works with a subset of types. And average devs work in multitargeted projects (even if they didn't configure the build) and will for years to come. And even for more experienced developers, we shouldn't force them to ifdef all use of this syntax when multitargeting, e.g. roslyn itself should be able to use the syntax. |
This pattern of "use X if available otherwise fall back to Y" is pretty common in the C# compiler. It allows to generate better code when possible but keep features working, albeit in a less efficient way, when they're not. The decisions here are very much inline with how the compiler tends to approach these problems (for pretty much all the usability reasons @stephentoub outlined above). The use of the data as a fixed item feels pretty domain specific. It's not a general purpose usage. As such my recommendation would be such domains should have an analyzer to guarantee they're only using the |
I guess one could make the argument that the fact it could not guarantee the underlying data was pinned would make it objectively worse, as it would give advanced users that would want to leverage that a harder time. That is, they'd have to either just essentially YOLO it, which is what TerraFX, ComputeSharp etc. are currently doing (technically relying on an implementation detail), or come up with some other (likely less efficient) alternative. Would it be doable to have C# guarantee the lowering to the data section when the .NET 8+ TFM is used? Or, when
That would be completely non viable in eg. TerraFX.Interop.Windows unfortunatley (not to mention it'd be much slower). Related - bumping in case it was missed, would this be doable at first?
|
It's somewhat contentious because for "other types" there is an existing syntax that works and that is fairly well understood on what it emits and what the various implications are for perf-oriented scenarios. This issue is proposing a syntax that helps enforce that the data doesn't allocate on every invocation and since there are various places in the BCL and throughout the broader ecosystem that are currently relying on certain additional implications for perf reasons its only natural that there are asks to help codify the implications more concretely. However, as I indicated above, the ask here is really just that there be "some way" to help guarantee some of the implications that high perf scenarios are actively relying on. This could be that there is some syntax like Its just requesting that an important detail devs and core code are relying on have some way of being codified.
Yes, but such projects won't have access to People will certainly do it regardless, but there are still many language features are only enabled on newer runtimes and so it doesn't seem "that odd" to have another one where there additional guarantees that could be made. |
Just noting I'm not personally concerned with my own libraries here. I'm only targeting .NET LTS + STS and can update it all to be validated/correct, even in the face of undefined or implementation defined behavior. I'm just trying to raise some of the concerns and asks for codification that exist more broadly as I do think that having some way to more broadly guarantee the data is "definitively pinned" (even if runtime specific) would be net goodness. This doesn't have to come in the form of unique syntax, it could come in the form of a spec guarantee, an attribute, or something else. |
Yes, they will. That's not how C# works. If you believe it's truly unsupported then please submit the PRs to runtime, Azure SDK, Roslyn, aspnetcore, and so on to stop using newer LangVers when targeting netstandard and the like. I'm sorry, I find this line of argument very frustrating. Very, very, very few devs have the same scenarios being microoptimized for here; many more would benefit from simplified syntax that avoids allocation. You get 95+% of the perf benefits by avoiding the allocation via caching it; we should enable the simple syntax that "just works" and does the best possible thing based on the target, rather than failing to compile or falling off a cliff, in either case increasing learning curve and complexity. If we also want to make additional spec guarantees for some types/uses, that's fine. |
The docs explicitly state that new language versions are not supported on older target frameworks. We don't block them as we use them ourselves on downlevel TFMs (barring features that rely on runtime support), but we explicitly tell customers that they are not supported. Having downlevel TFM support with the new LangVersion is reasonable, but we should consider the cost of implementation with the fact that it's an explicitly not supported scenario. https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/configure-language-version
|
Yes, "everyone" does. We go out of our way to keep them working unless there's a strong reason we can't. Adding a caching strategy that we'd want to be available even on the latest platform in order to support types that don't work with CreateSpan is not a significant burden. |
The compiler cannot be the bottleneck for relying on every detail of every aspect of C# programming. This is a large reason for why we added analyzers. It enables developers to add guarantees specific for their domain that don't fit into what the language either desires or has time to formally guarantee.
Analyzers are used in a large number of projects to guarantee a huge variety of scenarios. Categorically declaring analyzers to be "completely non viable" with out detail does not feel like good response here. |
I agree and an analyzer is often a good choice. But, there is also only so much that an analyzer can check and or guarantee. Would it be possible for Roslyn (not C#) to documented something like That would allow devs to then independently rely on the implementation detail and base their analyzers on something "sound". It would also still leave the implementation fairly flexible for non-constant data and for user-defined types. |
Sorry, I was on my phone and didn't elaborate properly on that, let me try again 😅
I guess the point here is, as we mentioned: this optimization is something that Roslyn is already doing, that many customers (and the runtime itself) have taken a dependency on, that is (as far as I can tell) not really likely to ever be changed (because why should it), and that honestly just makes sense in the first place (if you have an RVA span of |
Discussed in https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-10-09.md#readonlyspan-initialization-from-static-data. As this has mostly been subsumed by C# 12 features, we are closing this out. |
This already works inside functions. for exaple:
Get compiled into:
ReadOnlySpan is practically const. Both functions use same memory address. But it is not very nice for programer. It would be better if we could do obvious thing.
|
ReadOnlySpan initialization from static data
Summary
Provide a syntax for initializing a
ReadOnlySpan<T>
from constant data and with guaranteed zero allocation.Motivation
dotnet/roslyn#24621 added compiler support that translates:
into non-allocating code that blits the binary data into the assembly data section and creates a span that points directly to that data. The same optimization is done for:
We now rely on this all over the place in dotnet/runtime and elsewhere, as it provides a very efficient means for accessing a collection of constant values with minimal overhead and in a way the JIT is able to optimize consumption of very well.
However, there are multiple problems with this:
to
are often met with confusion and misinformation.
Detailed design
Add dedicated syntax for creating spans without allocating that:
As the following syntax fails to compile today:
and
they could be co-opted for this purpose.
Opening up this syntax via the removal of
new T[]
doesn't prevent the optimization from being applied by the compiler whennew T[]
is used, but it would guarantee a non-allocating implementation when thenew T[]
isn't used.This could also tie in with
params Span<T>
: the syntax for the local variant could blit the data into the assembly if possible, or else fall back to the same implementation it would use for aparams Span<T>
method argument (assuming thatparams
syntax itself doesn't itself fall back to heap allocation).Implementation-wise, the compiler would use RVA statics whenever possible and fall back to static readonly arrays otherwise.
Drawbacks
TBD
Alternatives
TBD
Unresolved questions
Related to this, we have prototyped in both the runtime and the C# compiler support for extending this optimization to more than just byte-sized primitives. The difficulty with other primitives is endianness, and it can be addressed in a manner similar to array initialization: the runtime exposes a helper that either returns the original pointer, or sets up a cache of a copy of the data reversed based on the current endianness. There are also multiple fallback code generation options available for if that API isn't available. Such improvements to the compiler are related but separate from the improvements to the language syntax for this existing optimization.
Design meetings
cc: @jaredpar
The text was updated successfully, but these errors were encountered: