Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with the new Windows IO manager #364

Open
eborden opened this issue Jan 8, 2019 · 14 comments
Open

Integrate with the new Windows IO manager #364

eborden opened this issue Jan 8, 2019 · 14 comments

Comments

@eborden
Copy link
Collaborator

eborden commented Jan 8, 2019

This is a stub issues for discussing issues around the new Windows IO manager.

Discussion kicked off in: #357

So far:

@Mistuke

Just FYI, I'm finishing up the final bits of a new I/O manager for Windows in GHC. Which will require completely new code in network using the native async APIs. I'll create a ticket soon to start discussing how this should all look.

@eborden

@Mistuke Oh baby, that sounds like a whopper. Will that require any API changes? That will force us to bump the lower bound of base and drop support for many GHC versions. We should probably open an issue specifically for discussing that.

@Mistuke:

@eborden Yeah, it's years of work, the GHC patch itself clocks in at around 7k lines of code atm, without comments.. lol. On the bright side, it will fix almost all of the warts/quirks on Windows.

I haven't thought about the API in network yet, my hope is that since Winsock was originally based on BSD sockets anyway the native interface is close enough to maintain the same interface for most things. At least for IOCP, Still need to look into RIO support.

Yes I'll create a new ticket to discuss it through before I start any work on it. In terms of base GHC will (at least for a period) support both I/O managers and provides hooks and helpers to switch between the two for library code (the same helpers base uses). The new I/O manager won't be the default for a while, not the least until core libraries catch up.

My intention for this is to use Win32 to abstract that functionality, such that you will have to bump the minimum version of Win32 but not base. Win32 still maintains backwards compatibility back to GHC 7.6 which is a reasonably ways back.

@hvr

@eborden

That will force us to bump the lower bound of base and drop support for many GHC versions.

...but only if os(windows), no?

@kazu-yamamoto

@Mistuke Sound exciting!

What does RIO mean?

@kazu-yamamoto

@winterland1989 @Mistuke I would like to know whether of not Mio, libuv based IO manager and the new I/O manager for Windows can coexist if we create a proper layer to GHC.

@winterland1989

I'm not sure if IOCP based IO manager can be easily combined with current event-based interface, namely threadWaitRead/threadWaitWrite. I gave a talk on this on Haskell symposium 2018 and i don't think it's an easy job, but maybe @Mistuke can tell more on this.

On the other hand libuv based IO manager is a one-stop solution since libuv already take care of system call encapsulation, it's not only an IO manager but also a substitution to all the system packages, e.g. network, directory, etc. It should be able to coexist with whatever IO solution base provides, as a third-package.

@Mistuke
Copy link
Collaborator

Mistuke commented Jan 8, 2019

@kazu-yamamoto Registered I/O Networking Extensions, RIO, is a "new" API that has been added to Winsock to support high-speed networking for increased networking performance with lower latency and jitter. (It's been out for a few years now). Essentially though, in traditional Winsock code, when you e.g. send data, the buffer in user mode is copied and is locked into physical memory by the kernel, once the request completes the buffer is unlocked and data copied down to user mode buffer. This operation is expensive and so RIO offers a way to lock a pre-allocated buffer into physical memory, so the application and the kernel can read/write directly to it. greatly reducing your CPU overhead and latency. To get the most out of this we would likely need a new API targeted towards such applications requiring high throughput. There are similar technologies on Linux so it wouldn't necessarily be Windows only. Anyway more detailed summary here [1] and interesting read on experimentation done on ASP.NET and Azure [2]

[1] https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/hh997032(v=ws.11)
[2] https://www.ageofascent.com/2015/07/25/azure-cloud-high-speed-networking/

@winterland1989 The new I/O manager supports the event based system just fine. That's one of the reasons we did it inside base as this allows us to integrate with the scheduler. threadWaitRead and threadWaitWrite can be efficiently supported for sockets with the "new" Winsock 2 APIs in Windows 8 and up such as WSAEventSelect and directly integrated into the I/O manager's event system. By default however the I/O manager will just use the completion status to generate events. e.g. using WSAGetOverlappedResult. In any case, I don't foresee any issues with the events.

Because of this integration it can also support things such as setting NUMA affinity based on what the RTS wants using e.g. SOCKET_PROCESSOR_AFFINITY. Anyway, lots that can be done and supported, but first step is stability and will go from there.

@winterland1989
Copy link

WSAEventSelect is a very limited interface, while completion port will only notify you when the actual I/O activity is done, e.g. after recv/send has been performed. Which is different from epoll/kqueue model: in the later scenario, you have to perform the recv/send after being notified. So to make both models works, the original threadWaitRead/threadWaitWrite API need to be revised.

@Mistuke
Copy link
Collaborator

Mistuke commented Jan 8, 2019

WSAEventSelect is a very limited interface,

Or you just associate multiple sockets with each event, then loop through them and query the individual sockets when an event is triggered. Which sounds bad but in practice works quite well. This is how some torrent clients (like uTorrent) handle their connections. This assumes a "correct" distribution of work.

while completion port will only notify you when the actual I/O activity is done, e.g. after recv/send has been performed.

Read should work fine, it's write that may be an issue.

@Mistuke
Copy link
Collaborator

Mistuke commented Jan 8, 2019

and to be clear, I'm only talking about ways to maintain a working reasonably performant interface with what network has, not the only interface. I'd want a different interface for high performance I/O anyway for RIO.

@kazu-yamamoto
Copy link
Collaborator

@Mistuke If you want to revise the threadWaitRead/threadWaitWrite APIs, I can help you on Linux and macOS side.

@kazu-yamamoto
Copy link
Collaborator

@Mistuke @winterland1989 Thank you for your explanations. Many things are now clear to me.

@eborden eborden added this to the 2020 Q1 Release milestone Jun 20, 2019
@Mistuke
Copy link
Collaborator

Mistuke commented May 25, 2020

@eborden @kazu-yamamoto

It's time to start discussing this more, The I/O first version of the I/O manager will be in GHC 8.12 if all goes well https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1224.

That version will be feature complete enough to start work on network (will need some tweaks to support multiple worker threads, not sure I will get that in for 8.12 but 8.14 certainly).

So a couple of things, GHC will no longer contain any network related code or care about it. i.e. a handle is a handle, whatever it points to, don't care. GHC only cares about things that use readFile and writeFile.

so to support WINIO everything is moved to network library directly as it should be. There are a couple of things I would like for network. I think as you mentioned @kazu-yamamoto it's best to just revise the entire library and remove the need for threadWaitRead/threadWaitWrite and make things async on all platforms.

The new I/O manager can be detected at runtime and compile time by checking the CPP macro __IO_MANAGER_WINIO__.

Concretely for Windows I'd like to accomplish the following if possible

  1. Remove the need for configure. Anything we need to detect at Windows should be done at runtime not compile time. Maybe @hvr can advice on how to structure it so we don't have configure on Windows but keep it for everything else. Maybe a different config-network package?

  2. Windows now uses completion ports and should only use the WSA interfaces. Newer things I hav e been adding all use this interface instead of i.e. fread etc.

In a short description, the new I/O manager has these requirements:
2a. You must, at all times avoid the FD type on Windows and always use the GHC Handle or Win32 HANDLE. Using FD will lead to a segfault.
2b. Any new HANDLE you create must be attached to I/O manager before it's used with a call to associateHandle' or associateHandle.
2c. Any request must be done asynchronously. This is done by using withOverlapped or withOverlappedEx. This function takes an offset and two callbacks and return to you the result of the I/O operation. The first callback is the callback it needs to call to start your request. The second callback is the one it needs to call when your request finishes.

As an example, the lowest level file operation hwndRead does this

-- For this to actually block, the file handle must have
-- been created with FILE_FLAG_OVERLAPPED not set. As an implementation note I
-- am choosing never to let this block. But this can be easily accomplished by
-- a getOverlappedResult call with True
hwndRead :: Io NativeHandle -> Ptr Word8 -> Word64 -> Int -> IO Int
hwndRead hwnd ptr offset bytes
  = fmap fromIntegral $ Mgr.withException "hwndRead" $
      withOverlapped "hwndRead" (toHANDLE hwnd) offset (startCB ptr) completionCB
  where
    startCB outBuf lpOverlapped = do
      debugIO ":: hwndRead"
      -- See Note [ReadFile/WriteFile].
      ret <- c_ReadFile (toHANDLE hwnd) (castPtr outBuf)
                        (fromIntegral bytes) nullPtr lpOverlapped
      return $ Mgr.CbNone ret

    completionCB err dwBytes
      | err == #{const ERROR_SUCCESS}      = Mgr.ioSuccess $ fromIntegral dwBytes
      | err == #{const ERROR_HANDLE_EOF}   = Mgr.ioSuccess 0
      | err == #{const STATUS_END_OF_FILE} = Mgr.ioSuccess 0
      | err == #{const ERROR_BROKEN_PIPE}  = Mgr.ioSuccess 0
      | err == #{const STATUS_PIPE_BROKEN} = Mgr.ioSuccess 0
      | err == #{const ERROR_MORE_DATA}    = Mgr.ioSuccess $ fromIntegral dwBytes
      | otherwise                          = Mgr.ioFailed err

the completionCB tells it what the result of the call means. withOverlappedEx will block internally using a new IOPort primitive. A guiding principle in WINIO is that we never block in a state where the RTS has no idea what's going on. i.e. we never block in an FFI call!!

If the blocking isn't desired then network doesn't have to use this method at all. It's free to (and probably should) do it's own thing for efficiency reasons. The call back routine completionCB should never block as it's run on an OS worker thread.

  1. We use a new interface in Windows Vista SP1. This means this is the lowest supported Windows OS.

  2. I would also like to add RIO support https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/hh997032(v=ws.11). On Linux an equivalent would be I/O uring.

This would greatly increase the scalability of network on Windows while dramatically lowering CPU usage.

  1. Any I/O operation should be cancel-able. On Windows this can now be done from any thread and no longer needs to run on the thread that made the request.

/cc @angerman @bgamari @hvr @coot

@Mistuke
Copy link
Collaborator

Mistuke commented May 25, 2020

I have cc'd @coot who has been using an I/O library based on IOCP on Windows at https://github.com/input-output-hk/ouroboros-network/tree/5c253f7533901d2daf528388def065a54947a122/Win32-network We should ensure we cover their uses as they provide real world use-cases.

@kazu-yamamoto
Copy link
Collaborator

@Mistuke Good news! What can I do for you?

@Mistuke
Copy link
Collaborator

Mistuke commented May 27, 2020

@Mistuke Good news! What can I do for you?

I'd like to take you up on the offer of creating a new API that doesn't use threadWaitRead/threadWaitWrite as we can't efficiently support those :)

@Mistuke
Copy link
Collaborator

Mistuke commented May 27, 2020

Ideally also something that fits with https://github.com/simonmar/async

@coot
Copy link
Contributor

coot commented May 28, 2020

@Mistuke I am glad to see the changes are moving forward. The requirements for us were

  • reliable cancellation of threads that are blocked on reads
  • running concurrent reads and writes (it turned out that in the current state of affairs this is deadlocking on Windows).

We have tests that ensure those for our implementation (plus real users, and no complains :)) . Since we use the same high level api as the network package it will be probably very easy to re-use our tests.

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 17, 2020

WINIO has officially been merged in GHC 8.12. now the fun begins..

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 19, 2020

@kazu-yamamoto @eborden So the main thing I need some help with is how an asynchronous interface should look like.

The synchronous interface we won't change of course, but what happens on scenarios where you are handling thousands of connections, in these cases while the blocking interface requires thousands of Haskell threads, the bigger issue is the thousands of locks.

In such cases I think a callback interface would be more efficient.

Also one thing I really want to get rid off is needing configure on Windows. @phadej perhaps you have an idea how to do that while keeping it for non-Windows? AFAIK you can't change package types on sub-packages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants