-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka transport binding v2 #337
Conversation
CI error is ok since it'll be automagically fixed once merged. But the DCO issue is real. |
kafka-transport-binding.md
Outdated
|
||
The receiver of the event can distinguish between the two content modes by | ||
inspecting the `cloudEvents_contentType` property of the Kafka message. If the | ||
value is prefixed with the CloudEvents media type `application/cloudevents`, indicating the use of a known [event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap at 80
kafka-transport-binding.md
Outdated
|
||
#### 3.1.3. Metadata Headers | ||
|
||
All [CloudEvents][CE] attributes with exception of `data` MUST be individually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we include extensions in this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
@duglin - yep I will fix that tomorrow |
kafka-transport-binding.md
Outdated
|
||
##### 3.1.3.1 Property Names | ||
|
||
Cloud Event attributes are prefixed with "cloudEvents_" for use in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloudEvents_
? instead of "cloudEvents_" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use some sort of fqn: io.cloudevents.
to avoid potential clashes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using cloudEvents_ as it has been generally adopted by other transport bindings. I believe we also need to avoid '.' characters. We should be consistent across all bindings.
@duglin - your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like consistency, but I think we need to be consistent with that users of the transport expect. I did a quick search and came across this: https://streamsets.com/documentation/controlhub/3.5.0/help/pdesigner/datacollector/UserGuide/Origins/KConsumer.html
and it talks about things like ssl.truststore.location
and com.sun.security.auth.module.Krb5LoginModule
. So Kafka does seem to be ok with '.'.
I don't know enough about Kafka to know if FQNs would be more "expected" or not, I'll leave that to others - I'd be ok with either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clemensv - what is your opinion on this (above)? Do we adopt language packaging/namespace semantics like io.cloudevents or cloudEvents_ It does raise the question about event propagation between transports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record, Apache Kafka never managed to decide on whether "." or "_" are preferred as separators. Configuration parameters are always "." separated (as the Streamset doc shows), but topics are often namespaced with "_" (for example, the internal "__consumer_offsets" topic). I've definitely seen both used in "the wild" (a fact that was sometimes inconvenient as the project had to deal with converting topic names to metric names...)
Either separator will be acceptable by the Apache Kafka community (or rather, either will be objected to, by different sets of people).
(duglin: I edited the comment to escape the underscore since it wasn't showing up and was being treated as the "turn on italics" symbol)
kafka-transport-binding.md
Outdated
|
||
* `eventTime` maps to `cloudEvents_eventTime` | ||
* `eventID` maps to `cloudEvents_eventID` | ||
* `cloudEventsVersion` maps to `cloudEvents_cloudEventsVersion` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the general concept of this.
Our group has been doing exactly the same:
https://github.com/rh-event-flow/jcloudevents/blob/master/kafka/src/main/java/io/streamzi/cloudevents/kafka/util/KafkaHeaderUtil.java#L42
Ok, biggest diff is ours is on internal RecordHeader(s)
API, and no prefix.
Perhaps it should be based on the Header(s)
interface API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/cloudEventsVersion/specversion/
@bluemonk3y we have some POC code for this. and it would be nice to get this into the Cloudevents-sdk for java. See my proposal here: looking forward working with you on that |
|
||
------------------- key ---------------------- | ||
|
||
Key: mykey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kafka message key is where things are still a bit blurry to me. How will its value be obtained from the CloudEvents event? Have you foreseen some extractor function or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about that one, perhaps a header key should be defined whose value, if present, will be propagated as key of the Kafka message, e.g. cloudEvents_messageKey
. Alternatively, the binding, wherever it's running, could be configured with some kind of path expression which gets applied to the data
value to extract the value, e.g. /customer/id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, there is a PR for this: #218
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, thanks. I've commented over there, too. Seems that'd need resolution first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conflict we have here is that the partition key is a transport specific concern (and the requirements may vary across different transport options) and that the publisher ought not to care about what transport the event route gets bound to. We may also have the situation that an event gets first published via MQTT from a little device and then gets put on Kafka by a device gateway.
I think this binding needs to define a rule by which a key is constructed from the event rather than expecting that the event brings it along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the natural way to get a key for a Kafka message is to use the source
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if source
is the right place - I'd expect all events produced by same application to have identical source, why partitioning typically applies to events produced by same source.
I like the proposal from @clemensv in #218 :
" instead of putting the burden of producing an event key on the client and then only having one, any transport that requires particular constructs such as keys such define a mechanism by which you can harvest/synthesize a key from an incoming CloudEvent as some sort of transform. The spec doesn't need to prescribe how -- it just needs to say that that's how the key materializes. A transform that just cooks up a random key might also be valid if that's what you want."
KafkaConnect already has "key extraction" transformation, exactly because external records require mapping to Kafka keys, and the logic for doing so varies between use-cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hope a reasonable implementation of transport would allow plugging in the "key selection" logic, since we can't know in advance what it will be. (Although I'm still quite new to CloudEvents, so take my opinions with a bit of salt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added a comment on this over at #218. The idea being that instead of having fully pluggable logic for retrieving the key instead there could be a mechanism to just select the key from a given event field when sending a cloud event to a Kafka Topic.
The header prefixes might be a bit generous. I think we'll end up collapsing them to "ce" across the board. |
+1 |
It seems this PR is a bit behind the 0.2 spec version ... |
ping @hschaffner - please review and add any comments w.r.t. why the partitionkey extension should not be needed. ping @clemensv - please add comments about the call-back mechanism you mentioned in PR #429 @bluemonk3y now that #429 is merged, can you rebase this one to deal with the merge conflicts? Also, you may need to edit some of the text to deal with recent spec changes - such as the names of the attributes being changed (or lower-cased). thanks everyone for your patience. |
@duglin - yep as soon as I get a chance. |
@bluemonk3y sorry to pester :-) but any chance of getting an update of this one? I get the sense we're nearing a pretty big milestone and this is one of the bigger outstanding items for us. |
@duglin - sure, I'm currently away but will pick it up on Monday. |
@duglin, I reckon you mean #218? Could you (or someone else) perhaps give a summary of the latest state of the discussion? The closed PR #218 adds the What @clemensv describes in #218 (comment) makes sense to me: make this a concern of the transport binding which would allow for a callback function for retrieving the partition/message key on an per-event basis. The spec-defined One open question to me is how the callback would be defined. What exactly is it? E.g. a (Java) class implementing a certain interface for extracting the key based on an incoming event? Or would at the spec level just be defined that there is a callback mechanism but it'd then be up to specific implementations of this to define the concrete mechanism? Thanks! |
@bluemonk3y the travis errors seem real. Also, can you sign your commits? |
Even though I didn't tag this as "v1.0" (it's "try-for-v1.0"), it's been lingering for a while so it'll be first on the PR review list this week - please look it over when you get a chance. |
kafka-transport-binding.md
Outdated
|
||
This example shows the *binary* mode mapping of an event into the | ||
Kafka message. All other CloudEvents attributes | ||
are mapped to Kafka Header fields with prefix `cloudEvents_`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ce_ might just be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
kafka-transport-binding.md
Outdated
|
||
Examples: | ||
|
||
* `eventTime` maps to `cloudEvents_eventTime` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/eventTime/time/
Example for the [JSON format][JSON-format]: | ||
|
||
``` text | ||
content-type: application/cloudevents+json; charset=UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shows no prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing a ce_
or is the sentence in the first paragraph of this section incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intro paragraph is wrong
kafka-transport-binding.md
Outdated
placed into the Kafka message value section | ||
using an [event format](#14-event-formats). | ||
|
||
In the *binary* content mode, the value of the event `data` attribute MUST be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to move this paragraph up so it's next to the one on line 52, that also talks about "binary" mode?
kafka-transport-binding.md
Outdated
Examples: | ||
|
||
* `eventTime` maps to `cloudEvents_eventTime` | ||
* `eventID` maps to `cloudEvents_eventID` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/eventID/id/
@hschaffner did you have any comments on this one? In particular around the |
@bluemonk3y rebase needed - and there are some outstanding comments |
kafka-transport-binding.md
Outdated
cloudEvents_eventTime: "2018-04-05T03:56:24Z" | ||
cloudEvents_eventID: "1234-1234-1234" | ||
cloudEvents_source: "/mycontext/subcontext" | ||
ce_specVersion: "0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/specVersion/specversion/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and "0.4-wip"
kafka-transport-binding.md
Outdated
------------------- value -------------------- | ||
|
||
{ | ||
"specVersion" : "0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/specVersion/specversion/
s/0.1/0.4-wip/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I fixed the wonky rebase!
I think the rebase got a little wonky |
Kafka transport binding for CloudEvents, similar to the HTTP binding and proposed NATS, MQTT, AMQP bindings. Signed-off-by: Neil Avery <neil@confluent.io>
kafka-transport-binding.md
Outdated
### 2.1. data Attribute | ||
|
||
The `data` attribute is assumed to contain opaque application data that is | ||
encoded as declared by the `contentType` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contentType or "datacontenttype" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think I might be wrong, but I wanted to verify.
kafka-transport-binding.md
Outdated
here. | ||
|
||
The receiver of the event can distinguish between the two content modes by | ||
inspecting the `ce_contentType` [Header][Kafka-Message-Header] of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ce_contentType/ce_datacontenttype/ I think (but it's the "T" that jumped out at me)
kafka-transport-binding.md
Outdated
|
||
#### 3.3.1. Kafka Content-Type | ||
|
||
The [Kafka `ce_contentType`] property field MUST be set to the media |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"T" -> "t"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the "ce_" here is incorrect since this isn't a CE property. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
kafka-transport-binding.md
Outdated
|
||
------------------ headers ------------------- | ||
|
||
ce_contentType: application/cloudevents+json; charset=UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thin the "ce_" here is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree - removed
proprietary-specs.md
Outdated
@@ -1,4 +1,4 @@ | |||
# CloudEvent Specs for Proprietary Protocols and Encodings | |||
### CloudEvent Specs for Proprietary Protocols and Encodings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
spec.md
Outdated
@@ -58,6 +58,14 @@ that does not support that feature will then silently ignore that part of the | |||
message. The sender needs to be prepared for the situation where a receiver | |||
ignores that feature. | |||
|
|||
For clarity, when a feature is marked as "OPTIONAL" this means that it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the edits in this doc are invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duglin - I believe I have applied the changes correctly ;) - would you mind reviewing?
Kafka transport binding for CloudEvents, similar to the HTTP binding and proposed NATS, MQTT, AMQP bindings. Signed-off-by: Neil Avery <neil@confluent.io>
@bluemonk3y looks good to me. Can I get one more LGTM and then I'll merge!! |
Approved on the 6/20 call with the minor rebase fixes |
Approved, again :-) , on the 6/28 call - but this time with the rebase fixes |
@bluemonk3y et al... thanks a ton for your patience and hard work on this one!!! |
Kafka transport binding for CloudEvents, similar to the HTTP binding and proposed NATS, MQTT, AMQP bindings.
Follow-on PR from #300
Signed-off-by: Neil Avery neil@confluent.io