Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Add string support to ReadOnlyMemory<char> #14906

Merged
merged 2 commits into from
Nov 9, 2017

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Nov 7, 2017

cc: @ahsonkhan, @jkotas, @KrzysztofCwalina

(I have a separate PR for corefx that updates the ref and adds tests, as well as updates the implementation there, but I'll hold off on putting that up until I address any feedback on the implementation here, as it's almost identical there.)

Contributes to https://github.com/dotnet/corefx/issues/25085

ThrowHelper.ThrowArgumentOutOfRangeException();
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.start);
}

if (_index < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just do return new ReadOnlyMemory<T>(_object, _index + start, _length - start) here for all cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will simplify.

public override int GetHashCode()
{
return CombineHashCodes(_arrayOrOwnedMemory.GetHashCode(), (_index & RemoveOwnedFlagBitMask).GetHashCode(), _length.GetHashCode());
return CombineHashCodes(_object.GetHashCode(), (_index & RemoveOwnedFlagBitMask).GetHashCode(), _length.GetHashCode());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The masking off the RemoveOwnedFlagBitMask bit is unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

return ((OwnedMemory<T>)_arrayOrOwnedMemory).Span.Slice(_index & RemoveOwnedFlagBitMask, _length);
return new ReadOnlySpan<T>((T[])_arrayOrOwnedMemory, _index, _length);
return
_index < 0 ? ((OwnedMemory<T>)_object).Span.Slice(_index & RemoveOwnedFlagBitMask, _length) :

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is difficult to parse. can u split the conditions into if blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// The highest order bit of _index is used to discern whether _object is an array/string or an owned memory
// if (_index >> 31) == 1, _object is an OwnedMemory<T>
// else, _object is a T[] or string
private readonly object _object;
private readonly int _index;
private readonly int _length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we could use the sign of the _length to tell if the _object is a string, i.e. we do have one more bit to spare. Would we want to do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have the option, but it's not clear to me that's faster; a type check for string is quite fast, and we'll need to do the cast from object anyway if it is a string. Let's start with the type check, and we can always change the implementation detail later.

/// <returns></returns>
public static bool TryGetString(this ReadOnlyMemory<char> readOnlyMemory, out string text, out int start, out int length)
{
if (readOnlyMemory.GetObjectStartLength(out int offset, out int count) is string s)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

if (readOnlyMemory.GetObjectStartLength(out start, out length) is string s) 
{
   text = s;
   return true;
}

text = null;
start = 0;
length = 0;
return false;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I was avoiding that so that the caller wouldn't potentially witness the temporary values written to start and length in the case where the memory was wrapping something other than a string.

/// <param name="start">The starting location in <paramref name="text"/>.</param>
/// <param name="length">The number of items in <paramref name="text"/>.</param>
/// <returns></returns>
public static bool TryGetString(this ReadOnlyMemory<char> readOnlyMemory, out string text, out int start, out int length)
Copy link

@ahsonkhan ahsonkhan Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the parameter name inconsistency discussed during review? Edit: It wasn't.

Should it be out int index instead of out int start to be consistent with the other APIs here?

Edit2: nvm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the parameter name inconsistency discussed during review?

It wasn't. Which inconsistency? These match the names used with Slice and the ctors. (That said, as you know I don't like the names, so it pained me to write these 😄)

Copy link

@ahsonkhan ahsonkhan Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, index would be inconsistent, and start is fine :)

Ignore my comment.

if (_index < 0)
return new ReadOnlyMemory<T>((OwnedMemory<T>)_arrayOrOwnedMemory, (_index & RemoveOwnedFlagBitMask) + start, _length - start);
return new ReadOnlyMemory<T>((T[])_arrayOrOwnedMemory, _index + start, _length - start);
return new ReadOnlyMemory<T>(_object, _index + start, _length - start);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same optimizations should be done for regular Memory<T> slicing as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some of them can be. Let's do it as a separate change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'll just take care of it as part of this change, it's easy enough.

_index = index | (1 << 31); // Before using _index, check if _index < 0, then 'and' it with RemoveOwnedFlagBitMask
// No validation performed
_object = obj;
_index = start;
Copy link

@ahsonkhan ahsonkhan Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without _index = index | (1 << 31); for OwnedMemory, how would we be able to distinguish the object field between T[] and OwnedMemory?

Should we update https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Buffers/OwnedMemory.cs#L24?

And if so, we probably still needs to validate the length.

Copy link
Member Author

@stephentoub stephentoub Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OwnedMemory is never used to create a ReadOnlyMemory, only Memory. And Memory creates a ReadOnlyMemory via a reinterpret cast, so that code wasn't used. Have I misunderstood? If it's a problem, we have a test hole, as all tests are passing.

Copy link

@ahsonkhan ahsonkhan Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good point.

Although, I recall we had less than 100% branch coverage for tests, specifically for the internal constructor.
From dotnet/corefx#23701 (comment):
https://user-images.githubusercontent.com/6527137/29992954-49b3bce8-8f5d-11e7-8871-f26a77451a7f.png

// The highest order bit of _index is used to discern whether _object is an array/string or an owned memory
// if (_index >> 31) == 1, _object is an OwnedMemory<T>
// else, _object is a T[] or string
private readonly object _object;
Copy link

@ahsonkhan ahsonkhan Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces another difference between ReadOnlyMemory and Memory, semantically, since Memory cannot wrap a string but ReadOnlyMemory can. Can you please explain why string has to be wrapped in a ReadOnlyMemory<char> only rather than both? Is it because string is immutable?

Both Memory<T> and ReadOnlyMemory<T> are readonly, immutable structs with readonly fields. Why is one suitable but not the other. Similarly with Span, which only has the difference in its indexer which returns a ref T vs. a readonly ref T.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because string is immutable. Exposing the ability to mutate the string via the Memory's Span's writable ref indexer would be really bad.

Copy link
Member

@jkotas jkotas Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is Memory<T> MemoryMarshal.AsMemory<char>(ReadOnlyMemory<T>) going to work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that actually last night.
One option would simply be that this is more important, and ditch AsMemory. Another would be making AsMemory instead be just for Memory over bytes. Another would be throwing from AsMemory if it wrapped a string, or making it a Try method. We could also make it work, but then we hit the problem of if it wraps a string, Memory.Span will give you a writable span for a string... maybe that's ok since you went through Marshal and all bets are off?

/// <summary>Creates a new <see cref="ReadOnlyMemory{T}"/> over the portion of the target string.</summary>
/// <param name="text">The target string.</param>
/// <exception cref="System.ArgumentNullException">Thrown when <paramref name="text"/> is a null reference (Nothing in Visual Basic).</exception>
public static ReadOnlyMemory<char> AsReadOnlyMemory(this string text)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need these extension methods here if the implementation is the exact same as what is in corefx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation in corefx wouldn't have access to the internals these are using, i.e. the internal ctor and the internal GetObjectStartLength method.

@stephentoub
Copy link
Member Author

@dotnet-bot test Ubuntu armlb Cross Debug Innerloop Build please ("Caused by: com.fasterxml.jackson.core.JsonParseException: Numeric value (4303844442) out of range of int")

@stephentoub
Copy link
Member Author

@jkotas, @ahsonkhan, how are you guys feeling about this change now?

@stephentoub
Copy link
Member Author

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test please

/// Provides a collection of methods for interoperating with <see cref="Memory{T}"/>, <see cref="ReadOnlyMemory{T}"/>,
/// <see cref="Span{T}"/>, and <see cref="ReadOnlySpan{T}"/>.
/// </summary>
public static class MemoryMarshal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like MemoryMarshal.cs and most of Memory.cs/ReadOnlyMemory.cs share their implementation between corelib and System.Memory. Is there a better way to keep the sources in sync or is a comment stating "apply changes to both places" good enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only a few places where there are meaningful differences. It might be helpful to explore factoring those pieces out into helper methods in another partial class file, such that the main file with most of the logic can be sync'd and kept identical between them, with only the differences in files unique to the repos.

In the case of MemoryMarshal, the AsMemory method I've added here is identical, but I expect there will be other methods on MemoryMarshal that are not, and that will need internals access to the Memory and Span implementations.

Anyway, as a first step, once our active PRs go through, I'd suggest someone compares the files in the two repos and gets them to be as close as possible. I noticed there are various discrepancies between them that make diff'ing harder, like differences in XML comments, differences in the ordering of members, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest someone compares the files in the two repos and gets them to be as close as possible.

I wanted to do exactly that. If you would like to do so, let me know. Otherwise, I can essentially fix the ordering/comments so they match.

I noticed there are various discrepancies between them that make diff'ing harder, like differences in XML comments, differences in the ordering of members, etc.

Yep, noticed that myself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have option of mirroring directories between repos. We have the mirror for shared CoreLib directory (between CoreCLR and CoreRT), and for System.Diagnostics.Tracing directory (between CoreCLR and CoreFX - I have noticed @brianrob set it up recently - you can talk to him about what it involved). We can use the same tech for more parts, like System.Memory. It would get rid of the forwarders from SpanExtension to Span as a side-effect that I am not particularly happy about.

It might be helpful to explore factoring those pieces out into helper methods in another partial class file

Ifdefs work great for small differences as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're interested in mirroring, I worked with @safern to get this done. We have some improvements to make, because the code gets mirrored to multiple repos, but if you don't need your code mirrored to multiple repos then it should be fairly straight forward.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahsonkhan
Copy link

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test
@dotnet-bot test Perf Build and Test

@ahsonkhan
Copy link

ahsonkhan commented Nov 8, 2017

https://ci2.dot.net/job/dotnet_coreclr/job/master/job/perf-pipeline_prtest/

The Perf Build and Test leg is taking 6+ hours to complete.

@stephentoub
Copy link
Member Author

The Perf Build and Test leg is taking 6+ hours to complete.

cc: @adiaaida

@ahsonkhan
Copy link

cc @dotnet/dnceng

The Perf Build and Test leg is taking 6+ hours to complete.

13 hour ETA?
image

@mmitche
Copy link
Member

mmitche commented Nov 9, 2017

@adiaaida may know more.

I think we are just low on the machines required. The legs that run were changed recently and I think we run a lot more

@mmitche
Copy link
Member

mmitche commented Nov 9, 2017

These are a static pool of machines, not scalable VMs

@brianrob
Copy link
Member

brianrob commented Nov 9, 2017

@jorive, did we finish expanding the pool of perf machines?

@stephentoub stephentoub merged commit ada9618 into dotnet:master Nov 9, 2017
@stephentoub stephentoub deleted the add_memory_string branch November 9, 2017 04:19
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Nov 9, 2017
…tring

Add string support to ReadOnlyMemory<char>

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub added a commit to dotnet/corert that referenced this pull request Nov 9, 2017
…tring

Add string support to ReadOnlyMemory<char>

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@jorive
Copy link
Member

jorive commented Nov 9, 2017

@brianrob There is an open ticket to increase the Linux pool, and there are already 30 machines on the Windows pool. There is no enough capacity to run the required matrix, so we are cutting it down.

dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
…tring

Add string support to ReadOnlyMemory<char>

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
…tring

Add string support to ReadOnlyMemory<char>

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
…tring

Add string support to ReadOnlyMemory<char>

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
…tring

Add string support to ReadOnlyMemory<char>

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants