Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

The disk queue must assign an auto-incrementing ID to each element #27

Closed
Tracked by #118 ...
cmacknz opened this issue Apr 14, 2022 · 9 comments
Closed
Tracked by #118 ...

The disk queue must assign an auto-incrementing ID to each element #27

cmacknz opened this issue Apr 14, 2022 · 9 comments
Assignees
Labels

Comments

@cmacknz
Copy link
Member

cmacknz commented Apr 14, 2022

To support implementation of end to end acknowledgement and simplify the calculation of a queue lag metric, the queues must assign an auto-incrementing ID or sequence number to each element added to the queue. Both queue types must support this consistently:

See elastic/beats#32541 for the memory queue implementation.

@fearful-symmetry
Copy link
Contributor

Okay, so @cmacknz I'm assuming this is what we'll use the QueueID field in the protobuf spec for? I assume the idea is that the server-side PublishEvents will assign the queue ID based on some (global? Per-input?) counter, and that gets passed along to the queue, and returned to the input in the PublishReply in case it needs it for...something.

@cmacknz
Copy link
Member Author

cmacknz commented Apr 14, 2022

Yes. The delivery guarantees section of the design proposal explains this in a bit more detail.

If inputs writing to the shipper want an acknowledgement of when their message has been successfully written to the output, they need some identifier to use to track their message. Rather than force inputs to generate UUIDs in a possibly inconsistent way, we give them back the queue sequence number as it uniquely identifies the message in the pipeline. The flow is:

  1. Input publishes to queue, is given the queue ID of their message as a return value.
  2. The message is asynchronously acknowledged sometime later. The queue ID indicates which specific message was successfully written to the output.

There is an optional EventID in the protocol as well for anything that already has a natural unique identifier the input wants to use that for tracking instead (image a traceID or something similar that is already in some key-value store). Regardless adding sequence numbers to the queue gives us a nice way to calculate queue lag so we'd probably be doing it anyway.

@faec
Copy link
Contributor

faec commented Apr 14, 2022

Both the memory and disk queues have something resembling this internally, but getting them to handle this use case will take some work:

  • The disk queue assigns an ascending 64-bit id to everything that goes through it, but it's currently only assigned on output, when the input ACKing and such has already been completed. Assigning it at input instead and propagating it through serialization would probably be somewhat annoying but straightforward.
  • The memory queue assigns an ascending 32-bit id internally but it is linked to the producer (of which there can be many), not to the queue. Some plumbing can fix that, but reporting it usefully is still awkward in the queue's current form: the queue is very opinionated about storing everything as arrays of publisher.Event separate from its metadata. This is what I've run into with Make the memory queue work with types other than publisher.Event beats#31307 and it requires a pretty significant overhaul to be able to support more general types. The good news is that my work on that issue will prepare things nicely here, since I'm needing to explicitly pair up the queue entries with their metadata types, which will probably give us the plumbing we need for the IDs.

@fearful-symmetry
Copy link
Contributor

Now that elastic/beats#31307 is closed and most of the (currently concerning) tech debt is cleaned up, is there anything blocking this @faec ?

@cmacknz
Copy link
Member Author

cmacknz commented Jun 15, 2022

Switching the assignee to @faec here as she now owns the parent issue #9

@jlind23
Copy link
Contributor

jlind23 commented Aug 25, 2022

@cmacknz @faec is the 64 bit queue ID still something to do or can I close this issue?

@cmacknz
Copy link
Member Author

cmacknz commented Aug 29, 2022

The memory queue is done which is enough for 8.5 The disk queue can be handled in 8.6, I'll update the meta issues and the issue description.

@cmacknz cmacknz added v8.6.0 and removed v8.4.0 labels Aug 29, 2022
@cmacknz cmacknz changed the title The memory and disk queues must assign an auto-incrementing ID to each element The disk queue must assign an auto-incrementing ID to each element Aug 29, 2022
@cmacknz cmacknz closed this as completed Feb 23, 2023
@cmacknz cmacknz reopened this Feb 23, 2023
@faec
Copy link
Contributor

faec commented May 16, 2023

Assuming we're committed to the shipper-beat path (which seems to be the case now?) this issue is no longer required and can be closed.

@jlind23
Copy link
Contributor

jlind23 commented May 16, 2023

@faec Thanks Fae for noticing, closing this one as won't fix.

@jlind23 jlind23 closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants