-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add string support to ReadOnlyMemory<char> #14906
Conversation
ThrowHelper.ThrowArgumentOutOfRangeException(); | ||
{ | ||
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.start); | ||
} | ||
|
||
if (_index < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just do return new ReadOnlyMemory<T>(_object, _index + start, _length - start)
here for all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will simplify.
public override int GetHashCode() | ||
{ | ||
return CombineHashCodes(_arrayOrOwnedMemory.GetHashCode(), (_index & RemoveOwnedFlagBitMask).GetHashCode(), _length.GetHashCode()); | ||
return CombineHashCodes(_object.GetHashCode(), (_index & RemoveOwnedFlagBitMask).GetHashCode(), _length.GetHashCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The masking off the RemoveOwnedFlagBitMask bit is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
return ((OwnedMemory<T>)_arrayOrOwnedMemory).Span.Slice(_index & RemoveOwnedFlagBitMask, _length); | ||
return new ReadOnlySpan<T>((T[])_arrayOrOwnedMemory, _index, _length); | ||
return | ||
_index < 0 ? ((OwnedMemory<T>)_object).Span.Slice(_index & RemoveOwnedFlagBitMask, _length) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is difficult to parse. can u split the conditions into if blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
// The highest order bit of _index is used to discern whether _object is an array/string or an owned memory | ||
// if (_index >> 31) == 1, _object is an OwnedMemory<T> | ||
// else, _object is a T[] or string | ||
private readonly object _object; | ||
private readonly int _index; | ||
private readonly int _length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we could use the sign of the _length to tell if the _object is a string, i.e. we do have one more bit to spare. Would we want to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to have the option, but it's not clear to me that's faster; a type check for string is quite fast, and we'll need to do the cast from object anyway if it is a string. Let's start with the type check, and we can always change the implementation detail later.
/// <returns></returns> | ||
public static bool TryGetString(this ReadOnlyMemory<char> readOnlyMemory, out string text, out int start, out int length) | ||
{ | ||
if (readOnlyMemory.GetObjectStartLength(out int offset, out int count) is string s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
if (readOnlyMemory.GetObjectStartLength(out start, out length) is string s)
{
text = s;
return true;
}
text = null;
start = 0;
length = 0;
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I was avoiding that so that the caller wouldn't potentially witness the temporary values written to start and length in the case where the memory was wrapping something other than a string.
/// <param name="start">The starting location in <paramref name="text"/>.</param> | ||
/// <param name="length">The number of items in <paramref name="text"/>.</param> | ||
/// <returns></returns> | ||
public static bool TryGetString(this ReadOnlyMemory<char> readOnlyMemory, out string text, out int start, out int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the parameter name inconsistency discussed during review? Edit: It wasn't.
Should it be out int index
instead of out int start
to be consistent with the other APIs here?
Edit2: nvm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the parameter name inconsistency discussed during review?
It wasn't. Which inconsistency? These match the names used with Slice and the ctors. (That said, as you know I don't like the names, so it pained me to write these 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, index
would be inconsistent, and start
is fine :)
Ignore my comment.
if (_index < 0) | ||
return new ReadOnlyMemory<T>((OwnedMemory<T>)_arrayOrOwnedMemory, (_index & RemoveOwnedFlagBitMask) + start, _length - start); | ||
return new ReadOnlyMemory<T>((T[])_arrayOrOwnedMemory, _index + start, _length - start); | ||
return new ReadOnlyMemory<T>(_object, _index + start, _length - start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same optimizations should be done for regular Memory<T>
slicing as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, some of them can be. Let's do it as a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'll just take care of it as part of this change, it's easy enough.
_index = index | (1 << 31); // Before using _index, check if _index < 0, then 'and' it with RemoveOwnedFlagBitMask | ||
// No validation performed | ||
_object = obj; | ||
_index = start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without _index = index | (1 << 31);
for OwnedMemory, how would we be able to distinguish the object field between T[] and OwnedMemory?
Should we update https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Buffers/OwnedMemory.cs#L24?
And if so, we probably still needs to validate the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OwnedMemory is never used to create a ReadOnlyMemory, only Memory. And Memory creates a ReadOnlyMemory via a reinterpret cast, so that code wasn't used. Have I misunderstood? If it's a problem, we have a test hole, as all tests are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good point.
Although, I recall we had less than 100% branch coverage for tests, specifically for the internal constructor.
From dotnet/corefx#23701 (comment):
https://user-images.githubusercontent.com/6527137/29992954-49b3bce8-8f5d-11e7-8871-f26a77451a7f.png
// The highest order bit of _index is used to discern whether _object is an array/string or an owned memory | ||
// if (_index >> 31) == 1, _object is an OwnedMemory<T> | ||
// else, _object is a T[] or string | ||
private readonly object _object; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change introduces another difference between ReadOnlyMemory and Memory, semantically, since Memory cannot wrap a string but ReadOnlyMemory can. Can you please explain why string has to be wrapped in a ReadOnlyMemory<char> only rather than both? Is it because string is immutable?
Both Memory<T> and ReadOnlyMemory<T> are readonly, immutable structs with readonly fields. Why is one suitable but not the other. Similarly with Span, which only has the difference in its indexer which returns a ref T vs. a readonly ref T.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because string is immutable. Exposing the ability to mutate the string via the Memory
's Span
's writable ref indexer would be really bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is Memory<T> MemoryMarshal.AsMemory<char>(ReadOnlyMemory<T>)
going to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that actually last night.
One option would simply be that this is more important, and ditch AsMemory. Another would be making AsMemory instead be just for Memory over bytes. Another would be throwing from AsMemory if it wrapped a string, or making it a Try method. We could also make it work, but then we hit the problem of if it wraps a string, Memory.Span will give you a writable span for a string... maybe that's ok since you went through Marshal and all bets are off?
f47bd4a
to
12a55f9
Compare
/// <summary>Creates a new <see cref="ReadOnlyMemory{T}"/> over the portion of the target string.</summary> | ||
/// <param name="text">The target string.</param> | ||
/// <exception cref="System.ArgumentNullException">Thrown when <paramref name="text"/> is a null reference (Nothing in Visual Basic).</exception> | ||
public static ReadOnlyMemory<char> AsReadOnlyMemory(this string text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need these extension methods here if the implementation is the exact same as what is in corefx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation in corefx wouldn't have access to the internals these are using, i.e. the internal ctor and the internal GetObjectStartLength method.
@dotnet-bot test Ubuntu armlb Cross Debug Innerloop Build please ("Caused by: com.fasterxml.jackson.core.JsonParseException: Numeric value (4303844442) out of range of int") |
12a55f9
to
24f9204
Compare
@jkotas, @ahsonkhan, how are you guys feeling about this change now? |
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test please |
/// Provides a collection of methods for interoperating with <see cref="Memory{T}"/>, <see cref="ReadOnlyMemory{T}"/>, | ||
/// <see cref="Span{T}"/>, and <see cref="ReadOnlySpan{T}"/>. | ||
/// </summary> | ||
public static class MemoryMarshal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like MemoryMarshal.cs and most of Memory.cs/ReadOnlyMemory.cs share their implementation between corelib and System.Memory. Is there a better way to keep the sources in sync or is a comment stating "apply changes to both places" good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only a few places where there are meaningful differences. It might be helpful to explore factoring those pieces out into helper methods in another partial class file, such that the main file with most of the logic can be sync'd and kept identical between them, with only the differences in files unique to the repos.
In the case of MemoryMarshal, the AsMemory method I've added here is identical, but I expect there will be other methods on MemoryMarshal that are not, and that will need internals access to the Memory and Span implementations.
Anyway, as a first step, once our active PRs go through, I'd suggest someone compares the files in the two repos and gets them to be as close as possible. I noticed there are various discrepancies between them that make diff'ing harder, like differences in XML comments, differences in the ordering of members, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest someone compares the files in the two repos and gets them to be as close as possible.
I wanted to do exactly that. If you would like to do so, let me know. Otherwise, I can essentially fix the ordering/comments so they match.
I noticed there are various discrepancies between them that make diff'ing harder, like differences in XML comments, differences in the ordering of members, etc.
Yep, noticed that myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have option of mirroring directories between repos. We have the mirror for shared CoreLib directory (between CoreCLR and CoreRT), and for System.Diagnostics.Tracing directory (between CoreCLR and CoreFX - I have noticed @brianrob set it up recently - you can talk to him about what it involved). We can use the same tech for more parts, like System.Memory. It would get rid of the forwarders from SpanExtension to Span as a side-effect that I am not particularly happy about.
It might be helpful to explore factoring those pieces out into helper methods in another partial class file
Ifdefs work great for small differences as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're interested in mirroring, I worked with @safern to get this done. We have some improvements to make, because the code gets mirrored to multiple repos, but if you don't need your code mirrored to multiple repos then it should be fairly straight forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
https://ci2.dot.net/job/dotnet_coreclr/job/master/job/perf-pipeline_prtest/ The |
cc: @adiaaida |
@adiaaida may know more. I think we are just low on the machines required. The legs that run were changed recently and I think we run a lot more |
These are a static pool of machines, not scalable VMs |
@jorive, did we finish expanding the pool of perf machines? |
…tring Add string support to ReadOnlyMemory<char> Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
…tring Add string support to ReadOnlyMemory<char> Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@brianrob There is an open ticket to increase the Linux pool, and there are already 30 machines on the Windows pool. There is no enough capacity to run the required matrix, so we are cutting it down. |
…tring Add string support to ReadOnlyMemory<char> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
…tring Add string support to ReadOnlyMemory<char> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
…tring Add string support to ReadOnlyMemory<char> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
…tring Add string support to ReadOnlyMemory<char> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
cc: @ahsonkhan, @jkotas, @KrzysztofCwalina
(I have a separate PR for corefx that updates the ref and adds tests, as well as updates the implementation there, but I'll hold off on putting that up until I address any feedback on the implementation here, as it's almost identical there.)
Contributes to https://github.com/dotnet/corefx/issues/25085