Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add broker-side job stream service API #11707

Closed
Tracked by #11231
npepinpe opened this issue Feb 16, 2023 · 2 comments · Fixed by #11981
Closed
Tracked by #11231

Add broker-side job stream service API #11707

npepinpe opened this issue Feb 16, 2023 · 2 comments · Fixed by #11981
Assignees
Labels
component/broker kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0

Comments

@npepinpe
Copy link
Member

npepinpe commented Feb 16, 2023

Description

Job workers/streams will be managed in-memory by the broker. Since workers are always interested in all partitions, this can be a long living, singleton instance on the broker side.

The external job activator will be able to query this service to activate jobs.

In the prototype, I used SWIM to register/unregister workers/streams. However, I wasn't differentiating between workers other than by type. Since we want to keep the same API and treat each worker instance differently based on their properties, we cannot do this here.

There will be a new API in the broker which can receive commands from the gateway (i.e. using the protocol), with only two RPCs:

  • AddWorker
  • RemoveWorker
  • RemoveWorkers

An AddWorker request should contain the job type, worker's activation properties, and the gateway/recipient member ID where a job would be pushed.

Workers should be uniquely identifiable by their gateway (i.e. push recipient member ID) and their activation properties. Two workers with the same recipient and the same properties are logically equivalent and do not need to be registered multiple times.

Workers can be removed on demand via the RemoveWorker API, using their recipient and properties to identify them.

Note As this can be somewhat verbose, we may want to introduce an ID or something to make it easier to manage.

A RemoveWorkers request is received from a gateway (with its member ID as param), and all workers associated with it are removed. If a gateway dies (as detected via SWIM), the same should happen.

Note that we are not implementing pushing jobs here yet, just managing the lifecycle of workers. With this done, we can then provide a dummy external job activator implementation (unusable for production of course) which can activate jobs. We can then test that the jobs are activated with the right properties, and variables are collected, etc.

I would recommend further breaking down this issue, or are least completing it in multiple PRs.

@npepinpe npepinpe added component/broker kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. labels Feb 16, 2023
ghost pushed a commit that referenced this issue Mar 6, 2023
11945: Add StreamRegistry on broker side to keep track of open streams r=deepthidevaki a=deepthidevaki

## Description

Adds a StreamRegistry to keep track of open streams between gateway and broker.  This is in preparation for #11707.

## Related issues

Related #11707 


Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
ghost pushed a commit that referenced this issue Mar 6, 2023
11939: Adds broker job stream service r=npepinpe a=npepinpe

## Description

This PR adds a server endpoint for the job stream API, as well as the minimum communication protocol. The endpoint can receive requests to:

1. Add streams to the registry
2. Remove a specific stream from the registry
3. Remove all streams associated with a member from the registry

Additionally, when a member is removed from the cluster, all streams associated with it are removed. In the future we can think about handling `REACHABILITY_CHANGED` to temporarily disable certain streams without completely removing them.

The `LongPollingJobNotification` was replaced with scaffolding for the `JobGatewayStreamer`, but the functionality is largely the same. This will be expanded in a follow up PR.

## Related issues

related to #11707 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Co-authored-by: Nicolas Pepin-Perreault <43373+npepinpe@users.noreply.github.com>
@npepinpe
Copy link
Member Author

npepinpe commented Mar 7, 2023

Additionally:

  • Add registry metrics
  • Implement GatewayStreamer and GatewayStream which pushes to a specific topic
  • Emit metric on push

@npepinpe
Copy link
Member Author

npepinpe commented Mar 9, 2023

@deepthidevaki once you've merged the last PR, as discussed, please add a gauge for the registered stream count, trace logging of every job pushed, and debug logging of the stream request handler (adding, removing, etc.)

ghost pushed a commit that referenced this issue Mar 9, 2023
11961: Push jobs to registered streams r=npepinpe a=npepinpe

## Description

Implements a job specific `GatewayStreamer` and `GatewayStream`, which can push jobs out to specific streams once they've been activated by the engine.

The implementation is currently very simple:

1. Jobs are pushed out asynchronously
2. The target stream is picked randomly from the set of consumers
3. There is no retry; we try with one consumer only

The `GatewayStream` also expects the jobs (i.e. payload) to be immutable, and the `ErrorHandler` to be thread-safe.

This PR does not close the issue, as we're still missing adding metrics and some logging (which will then close the issue).

## Related issues

related to #11707 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@ghost ghost closed this as completed in fa4140b Mar 14, 2023
@npepinpe npepinpe added the version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0 label Apr 5, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/broker kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants