Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched Content Mode Events Not Mentioned in Spec #645

Closed
grant opened this issue Jun 15, 2020 · 10 comments · Fixed by #672
Closed

Batched Content Mode Events Not Mentioned in Spec #645

grant opened this issue Jun 15, 2020 · 10 comments · Fixed by #672

Comments

@grant
Copy link
Member

grant commented Jun 15, 2020

Why is Batched mode not mentioned about in the spec.md?
https://github.com/cloudevents/spec/blob/master/spec.md#message

But is mentioned in the http-protocol-binding.md?
https://github.com/cloudevents/spec/blob/master/http-protocol-binding.md#33-batched-content-mode

And JSON format?
https://github.com/cloudevents/spec/blob/master/json-format.md#4-json-batch-format


Is this still valid? Is it an oversight not mentioning it in spec.md?
CC: @dazuma

@cneijenhuis
Copy link
Contributor

It is intentional. The primer says:

Batching of multiple events into a single API call is natively supported by some protocols. To aid interoperability, it is left up to the protocols if and how batching is implemented. Details may be found in the protocol binding or in the protocol specification. A batch of CloudEvents carries no semantic meaning and is not ordered. An Intermediary can add or remove batching as well as assign events to different batches.

It is therefore up to the protocol (such as HTTP) to define it. The spec defines the semantic meaning of a (single) event only.

@grant
Copy link
Member Author

grant commented Jun 15, 2020

Shouldn't message denote that there may be N number of "modes" for a message (depending on the protocol?

By using the phrasing:

Message

Events are transported from a source to a destination via messages.

A "structured-mode message" is one where the event is fully encoded using a stand-alone event format and stored in the message body.

A "binary-mode message" is one where the event data is stored in the message body, and event attributes are stored as part of message meta-data.

It looks as though messages must follow one of the 2 following modes. Otherwise, why are we describing these modes in the first place?

Perhaps it should be noted that specific protocols may adopt additional modes.

@duglin
Copy link
Collaborator

duglin commented Jun 15, 2020

My reading of that section is that the spec defines two modes but doesn't mandate only those two. Notice there is no "MUST" in that first section. I don't see any harm in adding a sentence such as: protocol bindings MAY define, and use, additional modes.

@grant
Copy link
Member Author

grant commented Jun 15, 2020

Thanks for the reply. Still grokking why the the primer mentions batch, but not in the spec. I'm a little confused as to why the overarching spec has a Message section and then lists 2 types.

We should make it clear that these are 2 examples of modes, only relevant to certain protocol bindings. These two types are not used with the NATS protocol binding for example.


The HTTP Batch section might be missing a section. There sections for Metadata Headers in Binary and Structured but not for Batched. I don't know if metadata headers can be used here, and if so, how.

I would assume that there'd be a section that talks about headers, like:

Implementations MAY NOT include the same HTTP headers as defined for the binary mode.


We also are just inconsistent in the spec with multiple names that I think mean the same thing:

  • binary mode
  • binary-mode
  • binary content mode

The dash is probably a typo, but I'm not sure which term is right.

@cneijenhuis
Copy link
Contributor

Still grokking why the the primer mentions batch, but no the spec.

Because we are aiming for interoperability. Many protocols (whether Kafka, GC PubSub or (I guess) NATS) already batch messages. Some do it a bit more explicit, some try to hide batching/buffering from the developer.

I don't think we can specify anything useful about batching without breaking interoperability. That's why nothing should go into the spec. Still, it's an important topic. The general guidance on batching is in the primer, and, if necessary, in the transport.

If you look at Kafka or GC PubSub, you'll see that batching "just works" after you know how to map a CloudEvent onto a single message of the transport. How to batch multiple messages (CloudEvents or not) is already fully defined by Kafka/PubSub.


On modes: I'm not sure what different modes you could think of? And NATS is using the structured mode, isn't it?

Also, batching is usually implemented the structured mode. When we say structured, we simply mean that the attributes and the data is encoded together. This ends up in the message body. There may be multiple messages in the request/response, which is fine - as long as each message contains both attributes and the data of the CloudEvent together.
E.g. check the PubSub API for pulling messages: https://cloud.google.com/pubsub/docs/reference/rest/v1/projects.subscriptions/pull you'll see that it can receive batches of messages. Each message can be a CloudEvent in structured mode.

Hope that helps?

@deissnerk
Copy link
Contributor

While I generally agree to the explanation of @cneijenhuis , I think @grant has a point regarding batched as a content mode on its own. This is confusing.
Either we add it as a third content mode, or we define it as an event format that only makes sense for protocols that do not batch on their own.
@grant I don't think binary and structured should be regarded as examples of content modes. Their definition has been introduced intentionally and helped resolve some issues around datacontenttype and the CE type system. What other content mode in addition to structured, binary and maybe batched would you have in mind?

@cneijenhuis
Copy link
Contributor

cneijenhuis commented Jun 16, 2020

I agree it is confusing. I think the source of confusion is that, if you turn a single event into a (single) message, and send that as a request, there is no meaningful difference between the message and the request. This is especially true for the binary mode. And we start to treat request and message interchangeably.

However, once you transport multiple events, we should clearly differentiate between them! A request and a message are now meaningfully different concepts. With a single request, you transport multiple messages. Each message still encodes a (single) event.

Therefore IMO there is one mode (structured/binary) to map an event to a message, and another mode (single/batched) to map message(s) to a request. We can see this quite nicely here: https://github.com/cloudevents/spec/blob/v1.0/json-format.md#4-json-batch-format The Batch format builds ontop of the structured format.

I agree that calling it a Content mode here: https://github.com/cloudevents/spec/blob/master/http-protocol-binding.md#33-batched-content-mode is wrong - it relates very closely to the HTTP content type, but that contains both the way how to map a request to message(s), and how to map a message to a CloudEvent.

@grant
Copy link
Member Author

grant commented Jun 18, 2020

I'm just saying that it's confusing seeing an additional message mode when it's not at all mentioned at all in the main spec, where we clearly define binary and structured. It's mentioned in the primer but not the spec even in a small note anywhere.

The primer isn't really required reading when adhering to the spec, it's an additional doc.

Suggested action:
I think we should just add a note to the spec in the Message section to make it clear that thare are not only binary and structured modes. There's batched for HTTP. Or change the wording such that we don't suggest there are only 2 modes where in reality, there are 3 for HTTP.

@duglin
Copy link
Collaborator

duglin commented Jun 25, 2020

way back above I said:


I don't see any harm in adding a sentence such as: protocol bindings MAY define, and use, additional modes.


Would that cover your concern? We can PR this if so. I don't think it's a breaking change as this was clearly the intent since HTTP defined batching.

TBH, I'm still a little confused as to the reasons behind this issue. Is this just a wording concern about the spec not making it clear that there can be other modes, or is there an implementation/interop issue that you're running into?

@grant
Copy link
Member Author

grant commented Jun 25, 2020

I don't see any harm in adding a sentence such as: protocol bindings MAY define, and use, additional modes.

That works for me.

TBH, I'm still a little confused as to the reasons behind this issue. Is this just a wording concern about the spec not making it clear that there can be other modes, or is there an implementation/interop issue that you're running into?

I received a FR for using batched events, which would take a lot of time to implement across languages and am not sure if it's a real use-case we're missing.I'm just trying to make sure the libraries we're building are fully compliant.

Most online presentation don't talk about batched. The "Content Modes" section of the http protocol doc says SHOULD for structured and binary but omits a "MAY support batched mode" for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants