Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutability and fields (eg created/updated) #61

Open
bensheldon opened this issue Aug 28, 2015 · 7 comments
Open

Mutability and fields (eg created/updated) #61

bensheldon opened this issue Aug 28, 2015 · 7 comments

Comments

@bensheldon
Copy link

The scheme should define resource lifecycles and accompanying fields. It should be strict about what modifications trigger a new updated timestamp (hopefully all of them, but it should declare that behavior).

This is important for ETLIng and syncing data.

@mheadd
Copy link
Member

mheadd commented Sep 11, 2015

@bensheldon Apologies for the late reply on this.

Can you expand on this with a quick example? Thanks!

@bensheldon
Copy link
Author

As a user syncing data between my own database and a BLDS data set
When a permit is added to the BLDS data set (not necessarily when it is accepted by the city), it should have created_at and updated_at touched
And when the data row is changed within the BLDS data set, it should have updated_at touched

From my experience with Open311, in which there often is both an underlying Ticket system, and an intermediary/vendor/integrator serving the Open311 data, there is confusion around whether the timestamp fields represent the canonical Ticket system, or the intermediary's data and led to situations like:

A public open311 record was modified, but this was not reflected in "updated_at" because the modification was the result of a change to a secondary dataset that was integrated by the intermediary, rather than the primary record changing. In this case, the intermediary interpreted "updated_at" to only reflect changes to the primary record, not any secondary records, even though it triggered a change to the data served by Open311.

An analogous situation here might be: a building permit was not changed, but a separate contractor form was amended, causing a change to a contractor1 field. IMO, this should trigger updated_at and this behavior should be part of the specification.

@mheadd
Copy link
Member

mheadd commented Sep 11, 2015

Ah, that makes sense. Any thoughts on this @axtheset?

@mmartin78
Copy link

This is very useful, but perhaps should be optional?

@bensheldon
Copy link
Author

The benefit of this behavior is ensuring data integrity and improving the efficiency of data syncing between producers and consumers.

To speak again from my experience with the Open311 specification, I think that by solely defining a data schema (syntax), but not defining the behaviors (semantics) of those fields, it makes it very difficult to actually integrate systems that conform to the specification.

@mmartin78
Copy link

I agree with defining the semantics, just think it should be optional because I bet most agencies don't capture this data at all today.

@bensheldon
Copy link
Author

Maybe we should expand the discussion to canonical vs non-canonical fields. In asking for both timestamps and a semantics for timestamps, I don't have a preference for whether this represents canonical data (e.g. a datetime that's been stamped on the original form), or non-canonical data (the datetime that's stored in the intermediary database), other than to ask that the representation and semantics be defined as part of the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants