Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: What's best practice to add data versioning? #778

Closed
cmenzi opened this issue Feb 12, 2019 · 19 comments
Closed

Question: What's best practice to add data versioning? #778

cmenzi opened this issue Feb 12, 2019 · 19 comments

Comments

@cmenzi
Copy link

cmenzi commented Feb 12, 2019

Hi

I'm went through the latest CloudEvents docs and cannot find any examples on how to do the data versioning?

I saw the following example of EventGrid and see that there is eventTypeVersion in the header, but I think this based on CloudEvents V1.

  • Is the SchemaUrl the right element to use?
  • Do I need to add CloudEventExtension?
  • Use GetAttributes() and just add it? (Smelly)

Or is there something, I didn't saw?

Cédric

@cmenzi
Copy link
Author

cmenzi commented Feb 19, 2019

@clemensv Any suggestions?

@Rinsen
Copy link

Rinsen commented Dec 26, 2019

@cmenzi
I would add them as an extension attribute based on the documentation.
"ensure proper routing and processing of the CloudEvent"
From this
"Extension attributes to the CloudEvent specification are meant to be additional metadata that needs to be included to help ensure proper routing and processing of the CloudEvent. Additional metadata for other purposes, that is related to the event itself and not needed in the transportation or processing of the CloudEvent, should instead be placed within the proper extensibility points of the event (data) itself."

https://github.com/cloudevents/spec/blob/master/primer.md

@jskeet
Copy link
Contributor

jskeet commented Jun 17, 2020

(I realize this is a very old issue. I'm trying to go through and understand all the current feature requests etc.)

By "data versioning" do you mean breaking changes to the schema? If so, I'd recommend including a version number within the event type. Anyone subscribing to an event type "foo.v1" should be able to handle the data for any "foo.v1" (potentially discarding data it doesn't understand), but shouldn't expect to handle "foo.v2". It does mean that event producers may need to produce events for multiple event types for the same occurrence, but I think that's inherent in the problem space.

The Primer mentions this approach, albeit applied to dataschema as an example rather than type:

For certain CloudEvents attributes, the entity or data model referenced by its value might change over time. For example, dataschema might reference one particular version of a schema document. Often these attribute values will then distinguish each variant by including some version-specific string as part of its value. For example, a version number (v1, v2), or a date (2018-01-01) might be used.

I don't think this is particularly something that the .NET CloudEvents SDK needs to have an opinion on though - might it be appropriate to raise this in the spec repository instead of here? @cmenzi on those grounds, would you mind me closing this issue?

@cmenzi
Copy link
Author

cmenzi commented Jun 18, 2020

Hi @jskeet

This is exactly, how I did it. I'm using now using it this way:

{
...
"type": "com.mydomain.myservice.events.mymessagetype",
"source": "com.mydomain.myservice",
"dataschema": "com.mydomain.myservice.events.mymessagetype:1.0"
...
}

With this approach, filtering in ServiceBus is etc. is supported and together with an anti-corruption layer that handles those breaking changes work really well.

I just thought, that it would be helpful if the spec would have a separate field dataschemaversion. that people are enforced to think about versioning there contracts.

But you are right, the specs repository is a better place for this.

For me it's fine to close the issue.

@jskeet
Copy link
Contributor

jskeet commented Jun 18, 2020

@cmenzi: That's not quite the same though - because you don't include the version in the type.

I envisage users subscribing to an event via just a type, and dataschema being effectively informational. If I subscribe to a particular event type, I don't expect the schema to start changing on me in non-breaking ways. If it does, even if I can detect that via dataschema, I can't do anything with it.

Instead, if the version is included within the type (e.g. com.mydomain.myservice.events.mymessagetype.v1) then an event producer that wants to create a breaking schema change can emit two events, one for v1 and one for v2, with different event types to distinguish them. Existing subscribers will receive the old schema, and new subscribers can choose to subscribe to v2. Eventually you'd want to turn off the v1 event type after an advertised deprecation period.

@cmenzi
Copy link
Author

cmenzi commented Jun 18, 2020

@jskeet Ok, this make sense! Then, versioning should be on the type and which also would have different dataschema.

V1

{
...
"type": "com.mydomain.myservice.events.mymessagetype.v1",
"source": "com.mydomain.myservice",
"dataschema": "com.mydomain.myservice.events.mymessagetype.v1"
...
}

V2

{
...
"type": "com.mydomain.myservice.events.mymessagetype.v2",
"source": "com.mydomain.myservice",
"dataschema": "com.mydomain.myservice.events.mymessagetype:v2"
...
}

Why we need the dataschema then? Maybe to address minor (non-breaking) changes?

V2.1

{
...
"type": "com.mydomain.myservice.events.mymessagetype.v2",
"source": "com.mydomain.myservice",
"dataschema": "com.mydomain.myservice.events.mymessagetype:v2.1"
...
}

And it also implies that filtering on ServiceBus, etc. should be done on type aswell, right?

@jskeet
Copy link
Contributor

jskeet commented Jun 18, 2020

Caution: I'm relatively new to CloudEvents, but I have a fair amount of experience of the pain of versioning. This is all my personal opinion.

Then, versioning should be on the type and which also would have different dataschema.

Yes. Although I'd argue that it's possible for events of a single event type to advertise different dataschema based on different datacontenttype. For example, suppose you can subscribe to an event and say "I'd like the data to be in binary protobuf format" or "I'd like the data to be in JSON" - those would have different datacontenttype values, and it would make sense for the dataschema for the protobuf event to be a reference to a .proto file, and the dataschema for the JSON version to be a reference to a JSON schema file.

Why we need the dataschema then? Maybe to address minor (non-breaking) changes?

I view the dataschema as likely to be useful in two scenarios:

  • Building tooling which can consume any event, and "understand" the data based on the schema, a little like using reflection
  • As a way of effectively providing documentation: if you (a developer) see an event but don't otherwise know where the schema is

I would personally not expect most event consumers to need the event schema at all.

As for using it for non-breaking changes: that's certainly an option. I'm not sure whether it's worthwhile, but I haven't thought about it in detail.

@Rinsen
Copy link

Rinsen commented Jun 18, 2020

I ended up with only using the type field like @jskeet describe, "com.mydomain.myservice.events.mymessagetype.v1" and left all the optional fields about datacontenttype, dataschema and subject empty and have not found any issues in what I have been using these event formats for.

Events contain enough information without those optional fields to be able to route and process events in a way that makes sense. So I understand why they are marked as optional in the spec.

@jskeet
Copy link
Contributor

jskeet commented Feb 12, 2021

I propose transferring this issue to the spec repo. It's not really specific to the C# CloudEvents SDK, and it would benefit from views across the community.

@cmenzi is that okay with you?

@cmenzi
Copy link
Author

cmenzi commented Feb 12, 2021

@jskeet Absolutely. Make sense.

@jskeet
Copy link
Contributor

jskeet commented Feb 12, 2021

Ah - looks like unfortunately I can't transfer it after all :(

@cmenzi
Copy link
Author

cmenzi commented Feb 13, 2021

Me neither 😕. Even I've created it.

@cmenzi
Copy link
Author

cmenzi commented Feb 24, 2021

Maybe @dazuma Can you please move/transfer the issue to the https://github.com/cloudevents/spec repository?

@dazuma
Copy link
Member

dazuma commented Feb 24, 2021

Summoning @duglin

@duglin duglin transferred this issue from cloudevents/sdk-csharp Mar 2, 2021
@duglin
Copy link
Collaborator

duglin commented Mar 2, 2021

Sorry for the delay - it's been transferred.

Just my 2 cents....

I'm not sure there is a single right answer.

I think it would be ok to keep the same ce-type value but then add/remove fields from the event - and only change is the dataschema value (if you have that) - if you're in the situation where you're only going to support exactly one version and the old one immediately goes away. I mean, adding a "v2" to your type is nice but if everyone breaks when you switch due to a mismatch on the "type" or due to the event body changing.... either way, people need to adapt to it.

Now, if you want to support multiple types at the same time (either from your event producer) or due to your subscribers getting the same type of events from multiple event producers (and some may be at v1 and others at v2), then adding a version string to your "type" feels more correct. But I'm not sure I would say it's wrong to keep the same "type" and vary the "dataschema" attribute instead - people just need to know what to expect and where to look, via docs I guess.

@duglin
Copy link
Collaborator

duglin commented Mar 10, 2021

@jskeet on the March 4th call, @jskeet agreed to write-up a PR for the primer

@jskeet
Copy link
Contributor

jskeet commented Mar 10, 2021

Unlikely to be anything this week, unfortunately. I'm hoping to carve out a bit of time next week.

jskeet added a commit to jskeet/spec that referenced this issue Mar 31, 2021
Addresses cloudevents#778

Signed-off-by: Jon Skeet <jonskeet@google.com>
jskeet added a commit to jskeet/spec that referenced this issue Mar 31, 2021
Addresses cloudevents#778

Signed-off-by: Jon Skeet <jonskeet@google.com>
jskeet added a commit to jskeet/spec that referenced this issue Apr 1, 2021
Addresses cloudevents#778

Signed-off-by: Jon Skeet <jonskeet@google.com>
jskeet added a commit to jskeet/spec that referenced this issue Apr 16, 2021
Addresses cloudevents#778

Signed-off-by: Jon Skeet <jonskeet@google.com>
jskeet added a commit to jskeet/spec that referenced this issue Apr 16, 2021
Addresses cloudevents#778

Signed-off-by: Jon Skeet <jonskeet@google.com>
jskeet added a commit to jskeet/spec that referenced this issue Apr 16, 2021
Addresses cloudevents#778

Signed-off-by: Jon Skeet <jonskeet@google.com>
jskeet added a commit to jskeet/spec that referenced this issue Apr 16, 2021
Addresses cloudevents#778

Signed-off-by: Jon Skeet <jonskeet@google.com>
@duglin
Copy link
Collaborator

duglin commented Jul 8, 2021

@jskeet we can close this one, right?

@jskeet
Copy link
Contributor

jskeet commented Jul 8, 2021

Yup!

@duglin duglin closed this as completed Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants