-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update AVRO mappings to match current event schema and type system. #497
Conversation
Signed-off-by: Evan Anderson <evan.k.anderson@gmail.com>
"null", | ||
"string" | ||
] | ||
["null", "string"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can undo this JSON auto-fomatting if preferred.
"int", | ||
"string", | ||
"bytes", | ||
"CloudEvent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with Avro... is the "CloudEvent" above replaced by the stuff on line 95? (the data/bytes thingy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "CloudEvent" was a recursive allowed value of the same object, to support Map
values.
Without Map
, these structures don't need to be recursive in their definition, which makes me much more comfortable.
The actual diff here is +boolean -CloudEvent
Even though we dropped 'Map' as a "portable" cross-encoding construct, JSON retains the ability to carry structured data and Avro should, too. The recursive schema (which is required for the encoder to work) enabled that. The Avro schema is not a strict schema in the sense that it accurately tracks all constraints of CloudEvents; it just makes the encoder work. With the change you are now limiting "data" in Avro to what "data_base64" does in JSON. In effect, I should be able to read a JSON event into an in-memory structure and then write that structure back out into Avro. With limiting "data" to binary, you're now forcing me to pick an encoding for that structured data, and if that were Avro, I would have to make my own schema and if that were JSON then I would wonder why I am using Avro to begin with. The "normal" case remains a self-contained event with data containing structured data and that needs to work in all encodings. |
I updated the I'm also no Avro expert, so there may be a better way. Feel free to send a pull request with corrections against this branch if there are better ways to do this: https://github.com/evankanderson/spec/tree/hadoop-doop-doop |
(Tested via python and: import avro.schema
avro.schema.parse(open("spec.avsc", "rb").read()) ) |
Whoops, I left in my virtualenv... let me back that out. |
…ed form Signed-off-by: Evan Anderson <evan.k.anderson@gmail.com>
Signed-off-by: Evan Anderson <evan.k.anderson@gmail.com>
af5fb88
to
98862b9
Compare
avro-format.md
Outdated
## 3 Data | ||
|
||
Before encoding, the AVRO serializer MUST first determine the runtime data type | ||
of the content. This may be determined by examining the data for invalid UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/may/might/ I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to "can" rather than "might".
Signed-off-by: Evan Anderson <evan.k.anderson@gmail.com>
139162d
to
cef0a85
Compare
Approved on the 9/12 - just waited for the RFC2119 keyword fix. |
It looks like we didn't update the AVRO format to actually include a place to store
data
when we graduated the payload from a special attribute to a header + payload model.