Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IList<ReadOnlyMemory<byte>> SocketAsyncEventArgs #49941

Open
davidfowl opened this issue Mar 20, 2021 · 49 comments
Open

Support IList<ReadOnlyMemory<byte>> SocketAsyncEventArgs #49941

davidfowl opened this issue Mar 20, 2021 · 49 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Net.Sockets tenet-performance Performance related issue
Milestone

Comments

@davidfowl
Copy link
Member

davidfowl commented Mar 20, 2021

Background and Motivation

Doing some profiling writing large payloads via Kestrel, about 1.2 % of the overall time is spent converting from ReadOnlySequence<byte> to a List<ArraySegment<byte>> for the eventual call to the underlying socket API that takes an array of pointers to buffers. Most of the time is spent calling into MemoryMarshal.TryGetArray to get an ArraySegment<byte> from the Memory<byte> most of which is a pointless conversion (which mostly defeats the purpose of use passing a pre-pinned memory handle to the networking stack).

Proposed API

namespace System.Net.Sockets
{
    public class SocketAsyncEventArgs
    {
+    public IList<ReadOnlyMemory<byte>>? MemoryList { get; set; }
    }
}

Usage Examples

private void SetBufferList(in ReadOnlySequence<byte> buffer)
{
    if (_bufferList == null)
    {
        _bufferList = new List<ReadOnlyMemory<byte>>();
    }

    foreach (ReadOnlyMemory<byte> b in buffer)
    {
        _bufferList.Add(b);
    }

    // The act of setting this list, sets the buffers in the internal buffer list
    MemoryList = _bufferList;
}

Risks

More ways of doing the same thing. When to choose MemoryList vs BufferList. Do they overwrite each other? etc.

@davidfowl davidfowl added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 20, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Memory untriaged New issue has not been triaged by the area owner labels Mar 20, 2021
@ghost
Copy link

ghost commented Mar 20, 2021

Tagging subscribers to this area: @GrabYourPitchforks
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

Doing some profiling writing large payloads via Kestrel, about 1.2 % of the overall time is spent converting from ReadOnlySequence<byte> to a List<ArraySegment<byte>> for the eventual call to the underlying socket API that takes an array of pointers to buffers. Most of the time is spent calling into MemoryMarshal.TryGetArray to get an ArraySegment<byte> from the Memory<byte> most of which is a pointless conversion.

Proposed API

namespace System.Net.Sockets
{
    public class SocketAsyncEventArgs
    {
+    public IList<ReadOnlyMemory<byte>>? MemoryList { get; set; }
    }
}

Usage Examples

private void SetBufferList(in ReadOnlySequence<byte> buffer)
{
    if (_bufferList == null)
    {
        _bufferList = new List<ReadOnlyMemory<byte>>();
    }

    foreach (ReadOnlyMemory<byte> b in buffer)
    {
        _bufferList.Add(b);
    }

    // The act of setting this list, sets the buffers in the internal buffer list
    BufferList = _bufferList;
}

Risks

More ways of doing the same thing. When to choose MemoryList vs BufferList. Do they overwrite each other? etc.

Author: davidfowl
Assignees: -
Labels:

api-suggestion, area-System.Memory, untriaged

Milestone: -

@ghost
Copy link

ghost commented Mar 23, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

Doing some profiling writing large payloads via Kestrel, about 1.2 % of the overall time is spent converting from ReadOnlySequence<byte> to a List<ArraySegment<byte>> for the eventual call to the underlying socket API that takes an array of pointers to buffers. Most of the time is spent calling into MemoryMarshal.TryGetArray to get an ArraySegment<byte> from the Memory<byte> most of which is a pointless conversion (which mostly defeats the purpose of use passing a pre-pinned memory handle to the networking stack).

Proposed API

namespace System.Net.Sockets
{
    public class SocketAsyncEventArgs
    {
+    public IList<ReadOnlyMemory<byte>>? MemoryList { get; set; }
    }
}

Usage Examples

private void SetBufferList(in ReadOnlySequence<byte> buffer)
{
    if (_bufferList == null)
    {
        _bufferList = new List<ReadOnlyMemory<byte>>();
    }

    foreach (ReadOnlyMemory<byte> b in buffer)
    {
        _bufferList.Add(b);
    }

    // The act of setting this list, sets the buffers in the internal buffer list
    MemoryList = _bufferList;
}

Risks

More ways of doing the same thing. When to choose MemoryList vs BufferList. Do they overwrite each other? etc.

Author: davidfowl
Assignees: -
Labels:

api-suggestion, area-System.Net.Sockets, untriaged

Milestone: -

@geoffkizer
Copy link
Contributor

I agree, this would be a good thing to do. Main difficulty is what you highlighted above:

More ways of doing the same thing. When to choose MemoryList vs BufferList. Do they overwrite each other? etc.

The other consideration here is, we want this to work seamlessly with NetworkStream, so whatever way we choose to expose this in SAEA and other socket APIs will effectively commit us to the same approach in Stream etc. So we need to design this in concert with #25344.

Also note, ReadOnlyMemory only covers send cases; presumably we want to support receive here as well. Which complicates things further.

@geoffkizer
Copy link
Contributor

All that said: Can we improve the cost of MemoryMarshal.TryGetArray? That seems really high for what should be a very common operation.

@davidfowl
Copy link
Member Author

BTW this all came up because I'm looking at this dotnet/aspnetcore#31110

The other consideration here is, we want this to work seamlessly with NetworkStream, so whatever way we choose to expose this in SAEA and other socket APIs will effectively commit us to the same approach in Stream etc. So we need to design this in concert with #25344.

Are you saying we should use the same types on both? I agree. I wouldn't gate one of the other but they can be designed together.

Also note, ReadOnlyMemory only covers send cases; presumably we want to support receive here as well. Which complicates things further.

Good point. I think multi-buffer sends are 90% and multi buffer receives are 10% (I pulled those numbers out of thin air btw).

All that said: Can we improve the cost of MemoryMarshal.TryGetArray? That seems really high for what should be a very common operation.

No idea. @GrabYourPitchforks? The other issue is that Kestrel uses 4K buffers so we end up doing this lots of time doing these on individual 4K chunks.

@davidfowl
Copy link
Member Author

BTW there's a bunch of other overhead I hit as well. I filed the issue for the easy one so far.

@geoffkizer
Copy link
Contributor

Are you saying we should use the same types on both? I agree.

Yes. E.g. is it IList or IReadOnlyList? We need to use exactly the same types here across Socket/Stream or we could accidentally cause ourselves a lot of pain.

I wouldn't gate one of the other but they can be designed together.

Yeah, we could choose to implement this in parts in different ways, we just need to feel confident that the design works in all cases.

@geoffkizer
Copy link
Contributor

BTW there's a bunch of other overhead I hit as well.

Please file it all :)

@benaadams
Copy link
Member

Or ReadOnlySequence<byte> 🤔

@davidfowl
Copy link
Member Author

The problem with ReadOnlySequence<byte> is that we'd end up forcing creation of a linked list.

@benaadams
Copy link
Member

The problem with ReadOnlySequence is that we'd end up forcing creation of a linked list.

Not sure why? Change _bufferListInternal from List<ArraySegment<byte>> to List<Memory<byte>>

// Copy the user-provided list into our internal buffer list,
// so that we are not affected by subsequent changes to the list.
// We reuse the existing list so that we can avoid reallocation when possible.
int bufferCount = value.Count;
if (_bufferListInternal == null)
{
_bufferListInternal = new List<ArraySegment<byte>>(bufferCount);
}
else

Can then enumerate the IList<ArraySegment<byte>> into it and enumerate the ReadOnlySequence<byte> into it?

@davidfowl
Copy link
Member Author

I meant to create the ReadOnlySequence in the first place.

@benaadams
Copy link
Member

benaadams commented Mar 24, 2021

I meant to create the ReadOnlySequence in the first place.

True; but might already have one, like if you were forwarding upstream? (like your usage)

public async Task InvokeAsync(HttpContext context)
{
    var reader = context.Request.BodyReader;
    var writer = _upstream.Output;
    while (true)
    {
        var readResult = await reader.ReadAsync(default);
        var buffer = readResult.Buffer;
        var isCompleted = readResult.IsCompleted;

        if (buffer.IsEmpty && isCompleted)
        {
            return;
        }

        await writer.WriteAsync(buffer);
    }

More mean your usage example could also be an api; and then don't need to create two Lists, since the first thing the SocketAsyncEventArgs does is copy the List to another List; so it would save one List creation and add sequence?

private void SetBufferList(in ReadOnlySequence<byte> buffer)
{
    if (_bufferListInternal == null)
    {
        _bufferListInternal = new List<ReadOnlyMemory<byte>>();
    }
    else
    {
        _bufferListInternal.Clear();
    }

    foreach (ReadOnlyMemory<byte> b in buffer)
    {
        _bufferListInternal.Add(b);
    }
}

@davidfowl
Copy link
Member Author

I wouldn't mind adding directly to the buffer list. The setting API is jank and requires me to manage my own list. That would be a decent middleground

@scalablecory
Copy link
Contributor

scalablecory commented Mar 24, 2021

We need a receive variant too, as well as Send() and Receive() overloads.

More mean your usage example could also be an api; and then don't need to create two Lists, since the first thing the SocketAsyncEventArgs does is copy the List to another List; so it would save one List creation and add sequence?

@antonfirsov has ideas to make Send() and Receive() be the first-class APIs and have SAEA eventually wrap them. This will resolve this inefficiency by allowing direct translation to WSABUF with no in-between.

@scalablecory
Copy link
Contributor

which mostly defeats the purpose of use passing a pre-pinned memory handle to the networking stack

This is interesting, because I'd expect working with ArraySegment to actually be the most efficient here.

What is the overhead of creating a GCHandle to a pre-pinned array? I'd expect it to be close to a no-op. Maybe this is something that can be optimized.

@davidfowl
Copy link
Member Author

Allocating a GCHandle isn't a noop, and we shouldn't be doing it. The stack should be re-written on top of IList<Memory<byte>> instead of ArraySegment<byte>, I'm not sure why that would be contentious (I started messing with it in a local branch).

@scalablecory
Copy link
Contributor

I'm not sure why that would be contentious

I'm not proposing we use ArraySegment, I'm just surprised it would be slower than Memory.

@davidfowl
Copy link
Member Author

The memory we create doesn't have a GC handle since it's using the pinned object heap. This code is allocating a GC handle for the underlying array when it doesn't need to. On top of that, we need to unpack the Memory back into an ArraySegment for each buffer in the linked list chain before setting the result on the SAEA.

There's all sorts of inefficiencies around multi-buffer sends, this is one of the lower hanging fruit 😄.

@stephentoub
Copy link
Member

stephentoub commented Mar 25, 2021

What is the overhead of creating a GCHandle to a pre-pinned array? I'd expect it to be close to a no-op.

Allocating a GCHandle isn't a noop, and we shouldn't be doing it.

Yeah, it's not free, and it'll be more expensive the more contentious it is. Creating a GCHandle entails synchronization (best case an interlocked, worst case a lock). It doesn't really matter what the type of the handle is, nor whether the array is already pinned or not, it's the same appx operation and cost. Below is an example, all serialized so no contention.

Method type Mean Error StdDev
Normal Normal 30.88 ns 0.144 ns 0.135 ns
Normal Pinned 31.36 ns 0.113 ns 0.100 ns
Normal Weak 31.53 ns 0.528 ns 0.441 ns
Normal WeakT(...)ction [21] 33.69 ns 0.151 ns 0.118 ns
Pinned Normal 30.80 ns 0.071 ns 0.059 ns
Pinned Pinned 31.57 ns 0.101 ns 0.090 ns
Pinned Weak 32.15 ns 0.111 ns 0.093 ns
Pinned WeakT(...)ction [21] 33.50 ns 0.121 ns 0.107 ns
private byte[] _normal = new byte[1024];
private byte[] _pinned = GC.AllocateArray<byte>(1024, pinned: true);

[Benchmark]
[Arguments(GCHandleType.Normal)]
[Arguments(GCHandleType.Pinned)]
[Arguments(GCHandleType.Weak)]
[Arguments(GCHandleType.WeakTrackResurrection)]
public void Normal(GCHandleType type) => GCHandle.Alloc(_normal, type).Free();

[Benchmark]
[Arguments(GCHandleType.Normal)]
[Arguments(GCHandleType.Pinned)]
[Arguments(GCHandleType.Weak)]
[Arguments(GCHandleType.WeakTrackResurrection)]
public void Pinned(GCHandleType type) => GCHandle.Alloc(_pinned, type).Free();

@davidfowl
Copy link
Member Author

Right, we create the Memory with the CreateFromPinnedArray to avoid GCHandle overhead and as it turns out, we only avoid it for single buffer writes.

PS I'm discovering these inefficiencies because I've been looking at this dotnet/aspnetcore#31110 and some other scenarios around reducing memory.

@antonfirsov
Copy link
Member

antonfirsov commented Mar 25, 2021

@antonfirsov has ideas to make SendAsync() and Receive() be the first-class APIs and have SAEA eventually wrap them.

I didn't mean to wrap the ValueTask API-s by SAEA API-s. The idea is that the (cancellable) ValueTask variants should be close-to-zero-overhead wrappers around the SAEA stuff, making them the primary choice for every developer who needs async, including Kestrel devs.

I proposed those overloads in #48477 (comment), copying them here for better visibility:

public class Socket
{
    public ValueTask SendAsync(IReadOnlyList<ReadOnlyMemory<byte>> buffers, CancellationToken cancellationToken = default);
    public ValueTask<int> ReceiveAsync(IReadOnlyList<Memory<byte>> buffers, CancellationToken cancellationToken= default);
}

@geoffkizer had a valid concern in a comment above:

Also note, ReadOnlyMemory only covers send cases; presumably we want to support receive here as well. Which complicates things further.

With the ValueTask overloads we can avoid exploding the SAEA's public API further. (We can manage the lists as implementation details of Socket's internal SAEA.)

To provide better experience for callers with a ROS, we may also add a convenience overload for ROS, which wraps the IList<ReadOnlyMemory<byte>> variant by populating an internally cached list, which seems to be very easy to implement, and the overhead seems to be acceptable for me:

    public ValueTask SendAsync(ReadOnlySequence<byte> buffers, CancellationToken cancellationToken = default);

@davidfowl would you be able to consume this stuff instead of SAEA? This would have some (but hopefully low) overhead, I think definitely much lower than the TryGetArray overhead you are mentioning, if we do it right.

@geoffkizer
Copy link
Contributor

That's nice but where do IOVectors and WsaBuf structs get stored? You can only stack alloc so many so in our case I'd like to figure out how to reuse a cached set of these structs allocated on the heap once we exceed that limit.

We can stack alloc a decent amount; I think we stack alloc up to 8 today -- @tmds would know for sure. And we could increase this (somewhat) if it makes a difference for real scenarios.

But regardless, we would still be using a cached SAEA instance under the covers, and that would reuse cached WSABUFs etc. It's just that we wouldn't expose Memory support on SAEA publicly, it would be an internal-only thing.

In other words I don't think there's an issue here.

@davidfowl
Copy link
Member Author

But regardless, we would still be using a cached SAEA instance under the covers, and that would reuse cached WSABUFs etc. It's just that we wouldn't expose Memory support on SAEA publicly, it would be an internal-only thing.

OK then I prefer this approach as well. I thought the goal was to avoid SAEA (which is something I want to do for reads BTW).

@geoffkizer
Copy link
Contributor

I thought the goal was to avoid SAEA (which is something I want to do for reads BTW).

Why is that?

@geoffkizer
Copy link
Contributor

BTW, how many buffers would you be using here typically?

I would be careful about approaches that often require using a lot of buffers for scatter/gather -- e.g. >4 to be conservative, or maybe >8. I'm not sure the kernels here do a great job in this case.

@stephentoub
Copy link
Member

stephentoub commented Mar 25, 2021

I thought the goal was to avoid SAEA

SAEA is an implementation detail for the socket methods, a detail we can change over time. Right now it stores the kitchen sink: if we cache one for use in single-buffer ReadAsync calls, we're still paying for all the state associated with every other operation it can support. My ideal over time is that we evolve to have some smaller internal data structure(s) per operation on top of which we can build both the socket operations and SAEA, which at that point would be purely legacy / for compat. Waving my hands.

@davidfowl
Copy link
Member Author

BTW, how many buffers would you be using here typically?
I would be careful about approaches that often require using a lot of buffers for scatter/gather -- e.g. >4 to be conservative, or maybe >8. I'm not sure the kernels here do a great job in this case.

We use 4K buffers today so 8 buffers would be 32K. Large responses are sometimes more than this so it can get up there in the number of buffers (256 for 1MB). I'm playing around with different buffer sizes and dynamic buffer sizes but that's not likely to be something that we commit anytime soon. Right now, I'm trying to reduce the overhead at all of the layers as much as possible.

SAEA is an implementation detail for the socket methods, a detail we can change over time. Right now it stores the kitchen sink: if we cache one for use in single-buffer ReadAsync calls, we're still paying for all the state associated with every other operation it can support. My ideal over time is that we evolve to have some smaller internal data structure(s) per operation on top of which we can build both the socket operations and SAEA, which at that point would be purely legacy / for compat. Waving my hands.

Yes, I know this, but we don't use the Task methods, they add overhead we don't need (and no I haven't measured 😄), but I don't see the value in adding overhead especially at the networking layer. We'll kinda do insane things here to avoid it. Right now, for ReadAsync we end up holding our derived SocketAsyncEventArgs in the state machine in a pending zero byte read and that's ~350 bytes per connection. I'd like that to be as minimal as possible so I can pool the SAEA for reads as well, instead of having one per connection.

@geoffkizer
Copy link
Contributor

I don't see the value in adding overhead especially at the networking layer.

While it's true that the Task APIs are basically just wrappers over public SAEA today, that will likely change over time, as @stephentoub pointed out. The goal would be to reduce memory usage etc in the Task APIs in a way that is difficult to do via public SAEA.

In other words, in the future you'll probably want to use Task APIs anyway.

@davidfowl
Copy link
Member Author

It's fine to change them but right now the Task APIs don't let us control the scheduling which we care about (actually I'd prefer that directly in the socket APIs themselves). But I'm less concerned about sends than receives. We've already changed the send side to pool SAEA so we have a very small number of those now, but receives mean we need one per connection.

I'd like to see how we achieve lower memory usage with the Task APIs directly, it'd need to pool something across instances to get the same low footprint.

@stephentoub
Copy link
Member

stephentoub commented Mar 25, 2021

right now the Task APIs don't let us control the scheduling which we care about

What scheduling does SAEA support that the Socket APIs don't?

@tmds
Copy link
Member

tmds commented Mar 25, 2021

On Unix we stackalloc up to 8 for this reason:

// IovStackThreshold matches Linux's UIO_FASTIOV, which is the number of 'struct iovec'
// that get stackalloced in the Linux kernel.
private const int IovStackThreshold = 8;

@davidfowl
Copy link
Member Author

davidfowl commented Mar 25, 2021

What scheduling does SAEA support that the Socket APIs don't?

I misspoke, it isn't the scheduling, it's just the unnecessary overhead in our scenario:

  • The Task based APIs try to support concurrent operations and have lots of code to handle those cases (overlapping reads and writes).
  • We want to directly schedule our continuation but with the Task based APIs we need to wait for the await to yield before we thread hop again.

@davidfowl
Copy link
Member Author

davidfowl commented Mar 26, 2021

To get this issue back on topic, because I'd like to see a proposal that we all agree with:

  • I'd like an API that doesn't regress the pooling we do today.
  • I'd like an API that doesn't have hidden allocations in the name of usability.

The problem I have with the SendAsync API is that it will inevitably have to pool WSABufs/IOVectors over a certain size (8?). If those heap allocated arrays are rented from the array pool, I think that would be a big improvement. The next thing would be to pool PreAllocatedOverlapped on windows across different socket instances (PreallocatedOverlapped).

I think if we did those, I would feel better about using the SendAsync overloads.

@geoffkizer
Copy link
Contributor

I wonder if it would make more sense to just allow you to reuse Socket instances, which would mean you'd reuse the cached SAEAs and PreallocatedOverlapped and other things too.

@antonfirsov
Copy link
Member

antonfirsov commented Mar 29, 2021

I wonder if it would make more sense to just allow you to reuse Socket instances, which would mean you'd reuse the cached SAEAs and PreallocatedOverlapped and other things too.

I guess this would mean designing new Dispose / Close API-s to which allow avoiding the disposal of cached SAEA-s, so socket.AcceptAsync(System.Net.Sockets.Socket? acceptSocket) can reuse them? This looks exposing implementation details, I think extending SocketAsyncEventArgs is still a (way much) lesser evil.

@karelz
Copy link
Member

karelz commented Apr 20, 2021

Triage: We should keep discussing final design for post-6.0. It is reasonable for .NET 7.

@karelz karelz added tenet-performance Performance related issue and removed untriaged New issue has not been triaged by the area owner labels Apr 20, 2021
@karelz karelz added this to the Future milestone Apr 20, 2021
@pepone
Copy link
Contributor

pepone commented Jun 18, 2021

Would you consider using the same API already used for QuicStream

public ValueTask WriteAsync(ReadOnlyMemory<ReadOnlyMemory<byte>> buffers, CancellationToken cancellationToken = default) => _provider.WriteAsync(buffers, cancellationToken);

This would make it simpler to support TCP and Quick with the same code.

@davidfowl
Copy link
Member Author

Now that we have in-line arrays I wonder if these should be span based APIs.

@stephentoub
Copy link
Member

Now that we have in-line arrays I wonder if these should be span based APIs.

Which specifically? Async and span don't mix well.

@davidfowl
Copy link
Member Author

It wouldn’t work for SocketAsyncEventArgs but IIRC the underlying methods don’t need to pin the array passed in, They are usually copied.

I know the lowest API we have is SAEA so this approach doesn’t flow through Socket.SendAsync in a clean way…

@stephentoub
Copy link
Member

stephentoub commented Jul 30, 2023

It wouldn’t work for SocketAsyncEventArgs but IIRC the underlying methods don’t need to pin the array passed in, They are usually copied.

I know the lowest API we have is SAEA so this approach doesn’t flow through Socket.SendAsync in a clean way…

I'm still not understanding. Can you sketch what it is you're envisioning? If the operation can't complete immediately and fully, such that the operation needs to pend and try again later, the relevant data or buffers still needs to be available.

Do you just mean an API like SendAsync(ReadOnlySpan<ReadOnlyMemory<byte>> and it would copy that span to its own possibly-pooled heap object?

@davidfowl
Copy link
Member Author

I was thinking about how we end up calling these APIs:

I'm still not understanding. Can you sketch what it is you're envisioning? If the operation can't complete immediately and fully, such that the operation needs to pend and try again later, the relevant data or buffers still needs to be available.

The buffers themselves are pinned for the lifetime of the async operation but the holder array just needs to be pinned for the length of the synchronous call.

Do you just mean an API like SendAsync(ReadOnlySpan<ReadOnlyMemory> and it would copy that span to its own possibly-pooled heap object?

Yes exactly. We would flow this span all the way down to the OS API calls then translate it into the relevant Span<WSABuffer/IOVector>.

This doesn't gel with the current SAEA BufferList property, and I haven't thought through how to flow it from the user code down to the OS API.

Contrived example:

using System.Net;
using System.Net.Sockets;
using System.Runtime.CompilerServices;

var socket = new Socket(SocketType.Stream, ProtocolType.Tcp);
socket.Bind(new IPEndPoint(IPAddress.Loopback, 5000));
socket.Listen();

var client = socket.Accept();

var buffers = new Buffers();
buffers[0] = GetHeaders();
buffers[1] = GetBody();

await client.SendAsync(buffers);

ReadOnlyMemory<byte> GetHeaders() => default;
ReadOnlyMemory<byte> GetBody() => default;

[InlineArray(2)]
struct Buffers
{
    private ReadOnlyMemory<byte> _buffers;
}

static class Extensions
{
    public static ValueTask<int> SendAsync(this Socket socket, ReadOnlySpan<ReadOnlyMemory<byte>> buffers) => default;
}

@stephentoub
Copy link
Member

Do you just mean an API like SendAsync(ReadOnlySpan and it would copy that span to its own possibly-pooled heap object?

Yes exactly. We would flow this span all the way down to the OS API calls then translate it into the relevant Span<WSABuffer/IOVector>.

Just to be extra clear, though, it's not just about pinning. The list of buffers needs to be available asynchronously from the current stack if the operation doesn't complete synchronously, which means copying it somewhere.

@davidfowl
Copy link
Member Author

Just to be extra clear, though, it's not just about pinning. The list of buffers needs to be available asynchronously from the current stack if the operation doesn't complete synchronously, which means copying it somewhere.

Are you talking about the Linux case where we queue the operation? Or do you mean in both cases?

@stephentoub
Copy link
Member

Are you talking about the Linux case where we queue the operation?

I'm talking about the non-Windows case where if the native call on the non-blocking socket returns EAGAIN / EWOULDBLOCK, we wait on an epoll / kqueue and try again when notified that we can make forward progress with the operation.

@davidfowl
Copy link
Member Author

I see, yes that makes sense. It would force us to copy the array in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Net.Sockets tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

9 participants