Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merge] Improve conception #192

Closed
twittemb opened this issue Sep 5, 2022 · 9 comments
Closed

[Merge] Improve conception #192

twittemb opened this issue Sep 5, 2022 · 9 comments

Comments

@twittemb
Copy link
Contributor

twittemb commented Sep 5, 2022

Hi @phausler, I see an attempt of avoiding creating too much tasks and share Iterators between them in the merge algorithm #185

Although it might work well, the resulting code is pretty complex.

I feel we could leverage async channels as a way of merging elements from several tasks that would consume the upstreams. It would simplify a lot the code. I have a working implementation on a branch that also handles rethrowing. I will open a PR tomorrow to suggest my implementation if you are willing to take a look.

thanks.

@FranzBusch
Copy link
Member

Thanks for opening this. While I get your concern for simpler code the merge operator is a pretty complex piece.
The main motivation behind my PR is not only to reduce the number of tasks but also to remove the Sendable constraints on the iterator.

At the same time the new implementation is up to 3 times faster since it avoids spawning all these tasks.

Feel free to open a PR but I want to point out some of the requirements you need fulfill;

  1. The Iterator must not be Sendable
  2. Demand to the upstream must only be issued if the downstream requested an element
  3. Cancellation to the upstream must be propagated when the downstream is cancelled
  4. Rethrows must be supported

Happy to discuss more here or in the PR.

@twittemb
Copy link
Contributor Author

twittemb commented Sep 5, 2022

Hi @FranzBusch

thanks for those insights. From what I can tell, those requirements are fulfilled by the implementation I have in mind. I would appreciate if you could take a look at the PR I will publish tomorrow.

thanks a lot.

@twittemb
Copy link
Contributor Author

twittemb commented Sep 6, 2022

@FranzBusch here is the PR #193
It might need some tweaking and I will ensure the tests all pass repeatedly later today (they all pass individually).

The code is simpler for sure but I decided to reuse and compose AsyncMerge2Sequence in order to make the version with 3 params. I guess we also could do a specific Merge3StateMachine but not sure it worth it.

@phausler what do you think ?

@phausler
Copy link
Member

phausler commented Sep 6, 2022

I think merge and friends in the async-algorithms package are not per-se a goal to make them simple; instead we should focus on correctness and performance. Particularly the heap cost of AsyncChannel is probably going to be higher than the alternate approach. So in short I think that we should focus on first off correctness of behavior (which the Sendable requirement for merge is part of that), second we should then focus on task creation overhead (e.g. we should strive for eliminating per-element tasks). Then after those two are done we should focus on reducing heap overhead. And finally for perf we should strive to avoid lock contention.

I think both of these approaches at a high level analysis meet the concept of correctness; which is great! And they (again at a 10000 meter view) reduce the per element task creation down to a per side based task creation which is also a distinct improvement. But as we look at potential heap impact and lock contention impact #193 has a slight advantage (at the cost of code complexity).

I guess id rather us tackle the hard problems in the async-algos package so that users of this package don't need to.

@FranzBusch
Copy link
Member

I agree with @phausler here. While I think the idea of backing algorithms with other primitives is a nice idea it often times comes at a performance downside. In the case of backing merge with the AsyncChannel it introduces a lock, a separate buffer and a heap allocation per AsyncChannel. On another note, I find the implementation with the AsyncChannels way more harder to understand than a simple rolled state machine where you can see all the possible transitions.

Furthermore, I think @phausler hits it on the head here. We should solve the hard problems here once, in the most performant way possible. It is okay to duplicate some logic if that comes with a significant performance upside.

My plan is to expand the test coverage of the merge PR that I have open so that it tests all possible state transitions of its internal state machine.

@twittemb
Copy link
Contributor Author

twittemb commented Sep 6, 2022

Hi to both of you. My intent was not to find at all cost a simpler solution. But if we can have something that is simple to understand, and maintain, that is correct and performant enough, my point is that it is worth giving a shot, even if it means reusing established components.

you are right in the overhead induced by AsyncChannel and I’m working on a variation using AsyncStream that already seems to improve perf.

@FranzBusch from my point of view an AsyncStream is somehow the equivalent of all the mechanics you have to manage your internal buffer. I’m just trying to not reinvent it (or perhaps I misunderstood something?)

I will push an update tomorrow with the optimized code. If it still does meet our requirements then I will give up 😀.

Have a good one.

@FranzBusch
Copy link
Member

I would argue the code that I wrote is simple since it contains a pretty straight forward state machine and all transitions can be easily seen by just reading over the switch. The only really complicated part is the setup of the task group. However, I understand that reusing something existing looks simpler.

While AsyncStream might be more performant even then you can't achieve the same level. Since every AsyncStream comes with a lock and a separate heap allocation for the buffer inside the stream.

@twittemb
Copy link
Contributor Author

twittemb commented Sep 8, 2022

Hi there,

I've updated the PR and its description.

My work intends to reach several goals:

  • minimise the task creation (1 per upstream + 1 main for the group)
  • have conditional Sendable conformance for AsyncMergeSequences
  • make a MergeStateMachine that can be reused with any number of sequences (preparing the field for variadic params)
  • be able to use the same MergeStateMachine in AsyncChunksOfCountOrSignalSequence
  • improve the performances
  • reuse the existing locking mechanism

Some hints about the conception:

  • I've introduced the concept of Regulator. This is a wrapper around an AsyncSequence that can iterates over it and suspend before calling next until it is resumed by an external entity. I think this is something we can leverage also in AsyncCombineLatestSequence where we will also want to avoid creating too much tasks. The Regulator has its own state, not shared with any one else, there is no synchronisation point with the outside.
  • The MergeStateMachine is no more specialised with several AsyncSequence generic types but only by the Element type, which opens the door to variadic params. The MergeStateMachine has its own state also, independent from the upstreams.

@twittemb twittemb changed the title [Merge] simplify the code by using AsyncChannel [Merge] Improve conception Sep 8, 2022
@twittemb
Copy link
Contributor Author

twittemb commented Sep 9, 2022

The PR has been abandoned in favour of the better implementation of @FranzBusch performance wise. Thanks everyone for the nice discussions.

@twittemb twittemb closed this as completed Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants