Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract platform-specifics of Fiber, Thread and EventLoop #6955

Conversation

straight-shoota
Copy link
Member

This is a step to prepare Fiber for Windows support.

src/crystal/system/fiber.cr Outdated Show resolved Hide resolved
Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's actually abstracted is mmap, which has more usages than allocating a stack for fibers. For example I use them in both my GC and my MT scheduler.

I'd prefer an Crystal::System::MemoryMap struct, and adding Windows support right away (File Mapping) to validate the abstraction model.

@RX14
Copy link
Contributor

RX14 commented Oct 18, 2018

It doesn't abstract mmap, it abstracts a certain usage of mmap, particularly, allocating stacks.

I don't think that should be the same abstraction as other mmap usages.

@straight-shoota
Copy link
Member Author

@ysbaddaden Sure, it can be renamed to be more general. What about main_fiber_stack, though? It needs to be platform specific but it's doesn't really belong to memory mapping?

Adding the windows implementation right away is not an issue. I already have that code. It compiles but I can't validate it, because event handling hasn't been ported. I don't think it makes much sense to add dedicated specs for this?

@straight-shoota straight-shoota force-pushed the jm/feature/fiber-prepare-port branch 3 times, most recently from f4e7ced to 451c5ba Compare October 19, 2018 08:53
@straight-shoota
Copy link
Member Author

I added the win32 implementation as far as I could get. I can't figure out the assembly for actually swapping the context on x86_64-win32. This is really not my area of expertise.

I'd be happy if somebody can take over this part. You guys now much better what to do.

@straight-shoota straight-shoota force-pushed the jm/feature/fiber-prepare-port branch 2 times, most recently from 4c8b870 to 226c330 Compare October 19, 2018 10:01
@straight-shoota straight-shoota force-pushed the jm/feature/fiber-prepare-port branch from 226c330 to c63ace5 Compare October 19, 2018 22:34
@ysbaddaden
Copy link
Contributor

ysbaddaden commented Oct 20, 2018

Good job at adding Win32 support overall. I didn't review correctly, but that looks nice!

I stand to my original point: abstracting mmap as "anonymous memory mapping" and making it win32-compatible would be much more useful than moving a few methods to allocate stacks, then do the same for the GC, and again for other usages, for example:

What differs are subsequent calls, but the madvise NOHUGEPAGE on Linux looks like a good option to be applied by default, and mcollect can be abstracted to protect a part of a memory mapping (win32: VirtualProtect). All that without requiring anything specific in Crystal::System for (de)allocating stacks.

Also, detecting a thread's stack should be the responsibility of Thread, not Fiber, that could ask the thread or be passed by the Thread when it initializes its main Fiber —I missed that in the stack overflow detection pull request.

Last but not least: I didn't know what I was doing while meddling with the arch-specific assembly, but I searched for reference, experimented with gdb and registers, and eventually learned a lot. Hint: look at ldc coroutines (D compiler based on LLVM) and search for win32 callee saved registers. For example https://en.wikipedia.org/wiki/X86_calling_conventions

@RX14
Copy link
Contributor

RX14 commented Oct 20, 2018

Assuming that a mmap even exists on the target platform is a mistake. mmap is a hugely complex syscall and the flags you want to use even if all kernels supported the same flags would differ depending on kernel idiosyncrasies. "Allocate this amount of heap memory from the OS and return a pointer to the start" is a very different operation to "Allocate a stack" and there's no reason to put them through the same method. There's no need to expose madvise and mprotect instead of just exposing "give me a stack".

Crystal::System should be the lowest common denominator for platform dependencies, and mmap is the opposite of that. Please don't think in terms of just POSIX and windows. They're hardly the only porting targets out there.

@ysbaddaden
Copy link
Contributor

@RX14 Do you have an example of a target that doesn't implement anonymous memory mappings, that allocates pages when they're accessed?

I don't intend to have a complete mmap interface, but the minimum we'll need in core/stdlib. See for example: ysbaddaden@29547dc

@straight-shoota
Copy link
Member Author

@ysbaddaden Your example looks quite complicated. It adds a lot of specifics of the memory mapping API to the general Fiber implementation:

Crystal::System::MemoryMapping.new(STACK_SIZE, :READ_WRITE, :STACK).tap do |mapping|
  mapping.advise(:NOHUGEPAGE)
  mapping.protect(GUARD_SIZE, :NONE)
end

I'd prefer to keep the interface between Fiber and the platform-specific implementation as small as possible. It much simpler to have only Crystal::System::Fiber.allocate_stack(STACK_SIZE) (this could eventually be implemented as re-using Crystal::System::MemoryMapping).

@RX14
Copy link
Contributor

RX14 commented Oct 22, 2018

Do you have an example of a target that doesn't implement anonymous memory mappings, that allocates pages when they're accessed?

Windows doesn't allow memory overcommit, which limits the number of fibers to be ((memory + pagefile - other programs) / 8MiB). So windows may not actually allocate pages when they're accessed, but it does memory accounting as if all pages were allocated immediately.

So to answer what I think you're implying: no, I don't have any idea how to properly implement fibers on windows apart from go-style growable stacks.

@RX14
Copy link
Contributor

RX14 commented Oct 22, 2018

Crystal::System should be as inflexible as possible, because interfaces being inflexible keeps them simple and easy to implement on new platforms.

Allocating a fiber stack is an operation which you can easily stub, or make special platform-specific alterations for. mmap is a complex unix interface with a lot of complexity and a lot of assumptions.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Oct 23, 2018

I'd still like a Stack struct (pointer + size) instead of a raw pointer, with a few methods (system specific):

  1. new() to create the allocation;
  2. protect() to declare guard page(s) explicitly, to avoid burying this as "implementation detail" —it's not, we expect them in the stack overflow detection;
  3. free() to dispose of the allocation.

And ideally:

  1. reserve() to grow pages manually, since Win32 can't do it automatically.

This could allow fibers to have differently sized stacks —defined manually or better automatically thanks to some compiler analysis, to reduce memory pressure on Win32 and virtual memory on POSIX.

We could have both a Stack and a MemoryMapping structs, but they're just the same to me: a low-level Allocation. Fiber doesn't care about reserve()... until we find a way to grow stacks (who knows), and a GC won't care about protect()... unless it wants to secure bounds.

@RX14
Copy link
Contributor

RX14 commented Oct 24, 2018

@ysbaddaden I'd be fine with a Crystal::System::Memory.new_stack(size) and Crystal::System::Memory.allocate which both return the same struct. We already have to deal with MAP_STACK on openbsd so there obvious is some differences between stack and general-purpose allocations without MADV_NOHUGEPAGE

@rishavs
Copy link

rishavs commented Dec 13, 2018

@ysbaddaden any resolution on this topic? Its been a couple of months since the last comment and I was wondering if there were other discussions on this offline. Would be nice to move this PR along.

@straight-shoota
Copy link
Member Author

@rishavs Finding an agreement on the interface is not really an urging matter as it won't really make a noticeable change from a usage point of view. The important thing is to figure out the assembly code for a fiber switch on win32. That will have an impact as it brings us closer to running fibers on windows.

That being said, it would of course be great if we could settle on the interface and proceed from there.

@straight-shoota straight-shoota force-pushed the jm/feature/fiber-prepare-port branch from dadc1c0 to 07ab99b Compare July 24, 2019 12:40
@straight-shoota
Copy link
Member Author

I've rebased this PR on current master and also took another stab on the context swap implementation.

The major problem is actually just #7955. The context swap implementation in the feature/fiber-win32 branch has really been working as it was supposed to all the time. That explains my previous frustration with this 😆

Thank you so much @cfsamson for investigating and finding the culprit. Your book is really interesting. I'm really glad there is finally help from someone with expertise on windows 😍

Now I've updated the feature/fiber-win32 branch. Context switches are basically working now =) I also added the XMM and TIB registers as suggested by @cfsamson

When running the concurrent_spec, there is a failure trying to resume a non-resumable main fiber. This might be related to channels. I'm not sure right now.

But this is already a big step ahead 🎉

I'd suggest to continue with reviews on this PR and after it is merged, I'll open a new one adding the win32 implementation.

src/crystal/scheduler.cr Outdated Show resolved Hide resolved
src/kernel.cr Outdated Show resolved Hide resolved
@straight-shoota straight-shoota force-pushed the jm/feature/fiber-prepare-port branch 2 times, most recently from d388a7d to 1a70510 Compare July 25, 2019 17:24
@straight-shoota straight-shoota force-pushed the jm/feature/fiber-prepare-port branch from 1a70510 to 972b7a8 Compare July 25, 2019 17:39
@straight-shoota straight-shoota force-pushed the jm/feature/fiber-prepare-port branch from 972b7a8 to 40e31c5 Compare July 25, 2019 17:44
@straight-shoota
Copy link
Member Author

I've reduced the scope of this PR to refactoring and extracting the platform-specific implementations.
Implementations for win32 will be added in a followup. This should make it easier to review and merge.

Meanwhile, I've added an EventLoop stub implenentation in the feature/fiber-win32 branch. concurrent_spec passes and channels work in general, but the channel spec seems to hang.

@straight-shoota straight-shoota changed the title Extract platform-specifics to Crystal::System::Fiber Extract platform-specifics of Fiber, Thread and EventLoop Jul 25, 2019
@ysbaddaden
Copy link
Contributor

Please close this PR or rebase it to match your last branch with the wIn32 context switch! It's much better than the preliminary work here, and I only have a few remarks. It won't delay the merge, and we'll have fibers working on win32! Huge feat!

In fact what I'm not really fond of are these PR commits. I'd prefer thread/pthread.cr, thread/win32.cr, thread/pthread/mutex.cr files, each fully implementing their "public" interface, instead of extracting generic Thread and Thread::Mutex classes. That would add duplication, but readability should improve.

@asterite
Copy link
Member

It's also the case that refactors like this will probably conflict much more against the current work on parallelism and so it's unlikely that they will get merged, or they will delay parallelism more.

@straight-shoota
Copy link
Member Author

@ysbaddaden
I'm not sure if the entire thing will be easier to review and merge. But sure, I can make a PR for the entire feature branch.

In fact what I'm not really fond of are these PR commits. I'd prefer thread/pthread.cr, thread/win32.cr, thread/pthread/mutex.cr files, each fully implementing their "public" interface, instead of extracting generic Thread and Thread::Mutex classes.

I'm not sure what you mean. There are no generic classes of either of them. src/thread.cr has some platform-independent stuff and the specific implementations just reopen this class.

@asterite
If we want concurrency support for win32 we need these refactors.

@ysbaddaden
Copy link
Contributor

This PR mostly brings stubs without the win32 implementation of Fiber stacks and x86_64 context switch, those are huge progresses! That helps counterbalance the changes and conflicts with other branches (MT) that this PR introduces.

About the rework: I meant that POSIX and Win32 don't need to share a base Thread class. The amount of duplicated code isn't large enough to justify the split. That would allow them to have their own internal details, so we treat them as "opaque classes".

I'd also like this rework to settle on where we should put specific classes, namely Thread, Thread::Mutex, Thread::ConditionVariable (pthread, win32), EventLoop (libevent, win32, then probably epoll, kqueue, ...) to avoid further conflicts.

Being system specific, I suppose they should be moved under src/crystal/system?

For threads, I guess they should be unix/pthread.cr, unix/pthread_mutex.cr, win32/thread.cr, and so on, then have src/thread.cr, src/thread/mutex.cr require the correct one?

For the event loop, maybe unix/libevent.cr and win32/iocp.cr then have src/crystal/event_loop.cr require the correct one?

Just a question: would it be hard to bring real win32 thread support, instead of bringing a stub? I know that's not needed to implement fibers, but threads aren't needed on POSIX either, and it would be awesome to have a basic support for them.

@cfsamson
Copy link
Contributor

May I ask how you plan to implement IOCP support? And where the epoll/kqueue/iocp based event loop plays into this?

Maybe I'm wrong but creating a new thread itself looks pretty easy using the CreateThread WinApi function, but I don't have the answer to what setup is required beyond that to fit it into Crystals thread support.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jul 29, 2019

@cfsamson we use libevent for now, which leverages epoll/kqueue for us. I'd like to use on Win32 too, but it may be more complex to integrate libevent than using IOCP directly on Win32.

Basically, with we juste create resume events (sleep) and read/write ready events on file descriptors, that fit both File and Socket on POSIX, but don't quite map to windows sockets an file handles.

@straight-shoota
Copy link
Member Author

@ysbaddaden I've opened #7995 which has everything in feature/fiber-win32 included. I'll answer you there.

I think this PR can be closed.

@cfsamson
Copy link
Contributor

@ysbaddaden OK, thanks. Seems that libevent abstracts over IOCP also pretty effectively when using bufferevents. I guess it's better to discuss this later.

@RX14
Copy link
Contributor

RX14 commented Jul 29, 2019

I don't want to rely on libevent for windows. It will not work. We must implement our own event loop from scratch on windows for any kind of sane implementation.

@straight-shoota straight-shoota deleted the jm/feature/fiber-prepare-port branch November 18, 2021 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted This issue is generally accepted and needs someone to pick it up kind:refactor topic:stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants