-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Slicing #120
Comments
How do you make |
@mikedn, in this proposal, slices would support operating over any region of memory, whether it was from an array or a native pointer or the char* to data in a string. Its implementation would require interacting with internals in the runtime, rather than operating over a publicly-exposed abstraction like |
Wow! Roslyn starts yielding it fruits! |
I'd much prefer that existing BCL classes that 'only take a There's a sort-of tangential issue around being able to treat unmanaged memory as |
@Porges, I think it wouldn't provide efficiency of raw arrays. |
This is one of those things that I'd really prefer could be handled by the runtime itself (with C# support in conjunction, of course). By that I mean have the ability directly in the runtime to define an array that is a range within another array where the runtime would manage the appropriate offset and bound checking. I know that To keep within the same syntax: byte[] b1 = new byte[500];
byte[] b2 = b1[10:10];
b2[0] = 123;
Debug.Assert(b1[10] == 123);
b1[11] = 234;
Debug.Assert(b2[1] == 234);
b2[-1] = 123; // throws IndexOutOfRangeException();
b2[10] = 123; // throws IndexOutOfRangeException(); A similar mechanism would be useful for substrings, where instead of actually copying the portion of the original string into a new string the substring would retain a reference to the original string with an offset and length: string s1 = "Hello World!";
string s2 = s1[6:5];
Debug.Assert(s2 == "World"); The one disadvantage to both being that it keeps a root reference to the original array or string around for the lifetime of the slice. |
Having done some work already on Array Slices (https://github.com/Codealike/arrayslice) I will share some of the gotchas I had to deal with... If slices are implemented in C# as a native construct understood by the compiler, you will make math oriented programmer like myself pretty happy. Which will be probably the ones that are seriously interested in having such a construct for performance reasons. IEnumerable, IList, etc have such a big performance impact that they are provided for convenience and/or interop with application code only. (see at the arrayslice link the performance impact). While the implementation details with structs or classes, readonly or not are very important at the language design level the biggest issue is behind the language surface. Today as it stands I know of 3 ways to handle this:
The first has a very important drawback, if your code doesn't support Slice you are screwed. Implicit converting to an array would not work... you have to pass the whole array (defeating the purpose of the Slice) or copy the array, a no-no for the audience that would really use it... The second is clear, performance... again a no-no for the intended audience. The third, AFAIK no support at the runtime level to actually create an array with "shared" memory. If there is IL to be able to do that, I am more than interested to know how :) ... therefore unless we can allow that at the runtime level slices will be useless or clash with code already written. Needless to say, that where this is going is that there is a serious need of a generic numeric constraints too (both fixed and floating) to really make C# shine for performance math code. Traits if implemented properly would work for that. I look forward to have an experience akind to what you have in Matlab or Python in terms of flexibility (not in syntax :D) Federico |
In what way is a slice different from an array? Couldn't the type simply be Also I believe I would be more clear if you used the range operator (
|
@Miista AFAIK arrays in the CLR are not just a bunch of memory, the GC have to track it down so there should be a descriptor somewhere, etc. Therefore, the slice part of that bunch of memory is not exactly an array. The CLR/Roslyn guys surely could give a more detailed answer as I am interested into knowing that too. :) |
Awesome feature. Been waiting for this since the inception of C# itself 🎱 @stephentoub - While its use is numerous, I'm curious of how this is going to be used practically in case of strings. In your example, you used the simplest case of switch, string helloWorld = "hello, world";
ReadOnlySlice<char> hello = helloWorld[:5];
ReadOnlySlice<char> world = helloWorld[7:];
switch(hello) { // no allocation necessary to switch on a ReadOnlySlice<T>
case "hello": Hello(); break;
case "world": World(); break;
}
Debug.Assert(hello + world == "helloworld"); I'm curious here as to how a string is compared to a That being aside, for any practical advantage in efficiency while dealing with strings, the compiler needs to support allocation free representation of strings, since you almost always have to recreate a string from the Unless, a Now, considering, a String.FormSlice, or a Slice.ToSourceFormat, or anything of that nature is provided, can this not be directly simplified to directly providing the type itself, with controlled mutability, than a new type called Slice? Example, string helloWorld = "hello, world";
// Internally built from the memory representation
// i.e, only the 'string' type is allocated (which acts a wrapper itself to the chars),
// but simply representing the same area of memory
// Conceptual pseudo: (String.FromSlice(String.Slice(helloWorld, 0, 5))
// But can be efficiently done without the middle conversions directly.
string hello = helloWorld[:5];
string world = helloWorld[7:]; // Internally built from the memory representation again
switch(hello) {
case "hello": Hello(); break;
case "world": World(); break;
} Now, my point being, instead of creating a new Type called Slice, or ReadOnlySlice, since this will anyway require a reverse conversion at some-point reducing the potential gain of efficiency, why not directly return the arrays, and simply provide a direct way to create an array from an existing representation of another underlying memory of the array of the same type? int[] x = {1, 2, 3, 4, 5};
// It returns a new array that internally maps directly to the array of x.
// Again, the returned int array type implicitly represents the same area of memory.
int[] slice = x[1:4];
// Alternatively,
int[] slice = Array.Slice(x, 1, 4);
// Readonly version: (Reuse existing types)
ImmutableArray[] slice = Array.ImmutableSlice(x, 1, 4); This ensures compatibility with all existing APIs, and no requirement for the API to be dealing with slices differently. IMO, API shouldn't have to think about where its a slice or an array. As far as they are concerned, they are getting a unit of data to operate on. The sender can decide whether its a slice that operates directly (conceptually similar to refs), or a copy. Does this not make sense, as I really see no practical benefit, and use case in separating it as a brand new type - Only more potential decisions to dealt with, polluting the APIs with another set of overloads. |
F# already has slices for both arrays and strings. Sadly, they deep copy which makes them too slow for many applications (I only use them in code golf). Aliasing is definitely the way to go. Provided the slice supports stride it could also help when hoisting bounds checks. I wish .NET provided overloads for functions like System.Double.Parse that accepted string, start index and length rather than just string. I often find my parsing code is much slower than necessary because this API design incurs huge allocation rates from unnecessary objects. |
Is there any chance we could get string slices at the same time? Not just |
@jnm2 Yes, that would be part of the point. |
@gafter: I don't think you would want to break backward compatibility as Java has had some trouble with string slices keeping large strings reachable too long, i.e. memory leaks. |
@gafter Sounds like there is possibly movement on allowing a Would the same be possible with arrays? |
If this comes with proper GC integration, then there will be no need for 2015-04-18 17:10 GMT+02:00 HaloFour notifications@github.com:
|
If the GC could pull off being able to collect a large string from which at least one slice was taken then that would be great and does allay our concerns. My concern is that the slices would be treated as having references back to the parent string and thus keep it from being eligible for collection. Another (tiny) reason to have a separate method is that we could establish a convention through which any type can be sliced. If a slice operation could function against any type that had a resolvable |
Yeah, such open convention would be really great. And I think such API need 2015-04-18 21:38 GMT+02:00 HaloFour notifications@github.com:
|
If slice is added, will it work with IList? IList is much more commonly used than raw arrays. A nice syntax for getting ranges of data should work with the most commonly used data structure. |
Ideally it would be great if such slice would not require allocation (i.e. be encoded in a struct that can be passed by ref/copy). If not, it would result in two allocation and two indirections most of the time (and increased object number for GC). Rust is doing something similar already: https://doc.rust-lang.org/std/slice/ Of course, if runtime can be modified, other options might be possible too. |
I don't understand why there are many comments about slicing in I think the focus here should be only on arrays. If array are accomplished the right way, ILists can easily be extended, by perhaps another interface, that allows access to IList's source array, which in turn can be sliced. |
TL;DR sorry that comment ended up very long. Most interesting idea is probably point 3 at the end. You raised some good points that got me into thinking. Let's talk about strings (arbitrarily, I think everything applies equally to arrays). Why I still believe that efficient interop with In particular it creates two worlds: the (few) functions that have optimized versions for people who needs perf, and the rest of the world who uses One example and then I move on to implementation ideas: Can string/slice compatibility even work? Here are all the solutions that I can think of:
One big issue that remains is that if you can return a "slice" string, it can live longer than the underlying buffer, which might cause trouble to GC (keeping a huge buffer alive for a tiny substring). |
I am going against your proposal because we already have ArraySegment and ReadOnlyCollection and also IEnumerable. The thing we really lack is the functionality of working with it like the actual array Which is, instead, the feature return by ref. So we should put you slicer back into ArraySegment instead |
It would be nice if we could capture slices within array patterns, swich(array) {
case { var first, int[:] slice , var last }: ...
// or perhaps
case { var first, var slice.. , var last }: ...
} or something like that. |
@alrz I think I would prefer a third syntax, similar to your second: |
@jods4 I'm agree the first one is ambiguous, but what do you mean by "more consistent with other languages"? |
@alrz I was thinking about destructuring arrays in other languages, which is quite similar to pattern matching (albeit unconditionally). So... forget about that comment! Altough I still like ES6/TS syntax ;) |
Having a count is far more common than having the index of the end element so it would be more convenient to programmers if the syntax Or an alternative is that current proposal could be kept intact but an implicit variable foo[a + b:$i + n]; // $i == a + b
foo[a:/* $i's scope begins here */ $i + 2 /* $i's scope ends here */]; |
I haven't followed through the entire discussion, but what's the current idea on having "array views" / slices that have a different type than the original array they share data with, i.e. CoreClr issue 1015? Will this be possible within reasonable constraints? General idea: byte[] someRawData = /*...*/
// Create a different view on the same data without copying
// (for performance and library communication reasons)
MyStruct[] interpretedBlittableData = Array.CreateView<MyStruct>(someRawData, /*...*/);
// (The above throws an exception if MyStruct isn't considered safe for this) Could probably be considered a slicing addon. |
I see a slice as an array of indexes, not as a subset of the original array-like object. Regardless of how you support this concept, I believe that this is key to usability. |
My programming seems to reveal that all methods which take a plain array must also be able to take Ilist. It seems easy to comprehend an offset and length parameter being automatically implemented to an overload of the method if the parameter is Ilist. Furthermore I think you can skip the boxing Slice requires by creating an 'Adapter' method on IList combined with the above. The semantic of readonly is already enforced by IList which also solves a majority of the other issues with immutable data. |
Couple of things that triggered me
The easiest solution is to deepcopy everywhere and make everything mutable and let CLR/RyuJIT deal with it, but essentially those are what kills performance. What gives performance is constant/immutable references and copy-as-needed per mutation. String is already immutable which should make it easy to slice. I'm neither aliasing nor GC expert, so I can't comment on the problem of 'small ref holding onto large superset'. I'm hoping this is a problem experts have previously tackled and have mitigation strategies. |
That sounds like it wouldn't provide the same performance as accessing a regular array, which would defeat the purpose of one of the use cases for slicing: Providing high-performance view access to an array / memory block without copying.
If you're going to mention performance, I believe that argument is in favor of mutability, rather than immutability. How is it more performant if you have to copy all the data on every mutation? Slices being mutable allows lightweight, high-performance array / memory views for both read access and mutation. If they're immutable, you'll get only half the use cases and have to use costly copy operations for the other half. You can still decide to copy a mutable array / slice if you want to - or pass it around as an |
+1 against having slice as an array of indices. It would be better to learn with frameworks/libraries that got it right, like for example Python's NumPy. Slices should represent views of the original array and are interpretable by ordinary functions just like ordinal arrays. It should be totally transparent for called functions whether they are processing an int[] or an int[5:10] or however an slice should be defined. To be honest, I couldn't completely understand from the above discussion why sometimes touching the compiler is seem as something to be avoided. In my view, this is a critical feature for #10378 that cannot be left half-baked (such as for example having only a pure BCL solution). Also, deprecating ArraySegment, and re-implementing it in terms of array slices should also be considered as an option. It is not like there weren't any breaking changes since .NET 1.0. Array slices (or more generally, safe memory views) are absolutely necessary for the success of C# as a language for high-performance computing. Right now, Python is taken way more seriously for high-performance computing than C#, and this really shouldn't have been the case (Python is a fine language though, but it was C# that initially proposed the non-compromise solution of handling unsafe contexts for more performant code, for example - as such, the fact that we are not being able to fulfill one of the first premises of the language might be a sign that even large or possibly breaking changes should be considered at this point). |
Why do you need a 'Slice' or Any method which takes an They key thing to take away from that example would be using Finally and in closing the GC and JIT changes on their own will definitely be enough on their own to seriously consider C# for a high performance solution (if as for some reason it's not already...), Mono or Otherwise; I don't see this If you look at the Reference Source you will see that there is an Furthermore in later versions of the framework it very well could be possible to allow such methods which take plain There are plenty of other things besides 'Slicing' which can help performance such as reference type stack allocations, SIMD, inter alia` which should be looked into way before time is wasted on this. |
@cesarsouza Python's Achilles heel is the GIL, and Python devs have the penchant for single-threaded performance over asynchronous, multithreaded operations. It is fine as a scripting language for synchronous tasks.
@ilexp I'm going to go ahead and say "It depends on the context" and whichever choice roslyn gets means backend will have to adjust their heuristics to catch the other case. Immutability is a very important property for compiler optimizations. |
Arrays are mutable... making a immutable slice doesn't magically change that. |
This proposal is now tracked at dotnet/csharplang#185 |
(Note: this proposal was briefly discussed in #98, the C# design notes for Jan 21, 2015. It has not been updated based on the discussion that's already occurred on that thread.)
Background
Arrays are extremely prevalent in C# code, as they are in most programming languages, and it’s very common to hand arrays around from one method to another.
Problem
However, it’s also very common to only want to share a portion of an array. This is typically achieved either by copying that portion out into its own array, or by passing around the array along with range indicators for which portion of the array is intended to be used. The former can lead to inefficiencies due to unnecessary copies of non-trivial amounts of data, and the latter can lead both to more complicated code as well as to lack of trust that the intended subset is the only subset that’s actually going to being used.
Solution:
Slice<T>
To address this common need, .NET and C# should support "slices." A slice, represented by the
Slice<T>
value type, is a subset of an array or other contiguous region of memory, including both unmanaged memory and other slices. The act of creating such a slice is referred to as "slicing," and beyond the support on theSlice<T>
, the C# language would include language syntax for declaring slices, slicing off pieces of arrays or other slices, and reading from and writing to them.An array is represented using array brackets:
Similarly, a slice would be represented using square brackets that contain a colon between them:
The presence of the colon maps to the syntax for creating slices, which would use an inclusive 'from' index before the colon and an exclusive 'to' index after the colon to indicate the range that should be sliced (omission of either index would simply imply the start of the array or the end of the array, respectively, and omission of both would mean the entire array):
Arrays could also be implicitly converted to slices (via an implicit conversion operator on the slice type), with the resulting slice representing the entire array, as if both 'from' and 'to' indices had been omitted from the slicing operation:
A slice could also be used in a similar manner to arrays, reading from and writing to them via indexing:
As demonstrated in this code example, slicing wouldn’t make a copy of the original data; rather, it would simply create an alias for a particular region of the larger range. This allows for efficient referencing and handing around of a sub-portion of an array without necessitating inefficient copying of data. However, if a copy is required, the ToArray method of
Slice<T>
could be used to forcibly introduce such a copy, which could then be stored as either an array or as a slice (since arrays implicitly convert to slices):This gives developers the flexibility as to whether they want the recipient of the slice to be working with the original array or not, minimizing unnecessary copies and ensuring that only the appropriate areas of the larger region are used (by design, there would be no way through the public surface area of
Slice<T>
nor through the C# language syntax to get back from a slice to the larger entity from which it was sliced).As creating slices would be very efficient, methods that would otherwise be defined to take an array, an offset, and a count can then be defined to just take a slice.
Solution:
ReadOnlySlice<T>
In addition to
Slice<T>
, the .NET Framework could also includes aReadOnlySlice<T>
type, which would be almost identical toSlice<T>
except that it would not provide any way for writing to the slice. ASlice<T>
would be implicitly convertible to aReadOnlySlice<T>
, but not the other way around.As with slicing an array, creation of a
ReadOnlySlice<T>
wouldn’t copy data, but rather would create a read-only alias to the original data; this means that while you couldn’t change the contents of aReadOnlySlice<T>
through it, if you had a writable reference to the underlying data, you could still manipulate it:While C# would not have special syntax to represent a
ReadOnlySlice<T>
, it could still have knowledge of the type. In particular, there is a very commonly-used type in C# that behaves like an array but that’s immutable: string. It’s very common for developers to want to slice off substrings from strings, and historically this has been a relatively expensive operation, as it involves allocating a new string object and copying the string data to it. WithReadOnlySlice<T>
, the compiler could provide built-in support for slicing off substrings represented asReadOnlySlice<char>
. This could be done using the same slicing syntax as exists for arrays.This would allow for substrings to be taken and handed around in a very efficient manner. In addition to new methods on String like Slice (a call to which is what the slicing syntax on strings would compile down to), String would also support an explicit conversion from a
ReadOnlySlice<char>
back to a string. This would enable developers to work with substrings efficiently, and then only create a copy as a string when actually needed.Further, just as the C# compiler today has support for concatenating strings and switching on strings, it could also have support for concatenating
ReadOnlySlice<char>
and switching onReadOnlySlice<char>
:The text was updated successfully, but these errors were encountered: