Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade docs: Interaction with release immutability and dates #849

Closed
jpmckinney opened this issue Mar 29, 2019 · 15 comments · Fixed by #1242
Closed

Upgrade docs: Interaction with release immutability and dates #849

jpmckinney opened this issue Mar 29, 2019 · 15 comments · Fixed by #1242
Assignees
Labels
Focus - Documentation Includes corrections, clarifications, new guidance, and UI/UX issues Focus - Merging Relating to the creating of compiled releases and versioned releases Semantics Relating to field and code descriptions
Milestone

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Mar 29, 2019

A publisher ought to be allowed to upgrade their data between OCDS versions. If they do so, we would presumably want the release id to remain the same; however, this would mean that there is an exception to a release's immutability.

Furthermore, per #848, they would need to either keep the release date values the same, or ensure that the releases have the same chronological order after an upgrade.

For dates, I propose:

  • date fields remain the same. We might need to clarify the description of the date field to make it clear that upgrading a release doesn't entail its re-publication. It's currently: "The date this information was first released, or published."

For immutability, we can either:

  1. Change the uniqueness of release id to be scoped by the ocid and the version, such that there can be two releases with the same id as long as they have different versions.
    • The version isn't presently declared on individual releases, but we might explore using $schema to indicate this in Add describedby field for the extended release schema #426.
    • The merge routine would have to specify that, if two releases have the same id, then they should be treated as duplicates and one should be ignored (presumably the one whose version matches the fewest other releases).
    • Similar to the uniqueness of the record in Unclear rules on uniqueness of record #847, it's very likely that publishers might have archives of older data using older versions of OCDS. If there were a major upgrade of OCDS (2.0), publishers might also want to offer OCDS data in multiple versions.
    • Essentially, this option says that the version of OCDS doesn't matter with respect to the uniqueness of a release; each data file is just a different serialization of the abstract release.
  2. Allow releases to be mutated only if they are being upgraded.
    • This option would preserve the idea of there only being one data file per release. However, as mentioned above, this might not be realistic, or desirable in case of a major upgrade.

I prefer something like option 1.

@jpmckinney jpmckinney added the Focus - Documentation Includes corrections, clarifications, new guidance, and UI/UX issues label Mar 29, 2019
@jpmckinney jpmckinney added this to the 1.1.5 milestone Mar 29, 2019
@jpmckinney jpmckinney changed the title Interaction between upgrades and release immutability and dates Upgrade docs: Interaction with release immutability and dates Mar 29, 2019
@jpmckinney jpmckinney modified the milestones: 1.1.5, 1.2.0 Jun 9, 2020
@jpmckinney
Copy link
Member Author

Postponing to 1.2.0, so that we can have a broader discussion about the rules around upgrades, of which this forms a part.

@jpmckinney jpmckinney added Focus - Merging Relating to the creating of compiled releases and versioned releases Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) labels Jun 9, 2020
@jpmckinney jpmckinney added Semantics Relating to field and code descriptions and removed Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) labels Jul 17, 2020
@ColinMaudry
Copy link
Member

ColinMaudry commented Sep 15, 2020

👍 for Option 1.

publishers might also want to offer OCDS data in multiple versions

That's right: OCDS 2.x will not be retro-compatible with OCDS 1.x, and as such can be considered as a new format, faithful to the spirit of OCDS, but distinct. And we wouldn't advise against publishing procurement data in a different format. It actually is good practice (e.g. French government publishes their data in national format + OCDS).

However, publishing the same data in various flavours of the same format (say OCDS 1.1 and 1.2) should be advised against.

@ColinMaudry
Copy link
Member

ColinMaudry commented Sep 15, 2020

each data file is just a different serialization of the abstract release

Indeed, and we must not forget that the OCDS release is distinct from what it describes. It's an opinionated abstraction of reality, built for specific and limited purposes, and other abstractions, serving other purposes, may live alongside.

@ColinMaudry
Copy link
Member

ColinMaudry commented Jan 14, 2021

Since we lean toward implementing option 1, here is what we should update:

@jpmckinney
Copy link
Member Author

jpmckinney commented Jan 14, 2021

For this issue, let's focus on specifying the rules. Let's leave checking for conformance with the aid of a $schema field to #426.

Release id is currently:

An identifier for this particular release of information. A release identifier must be unique within the scope of its related contracting process (defined by a common ocid). A release identifier must not contain the # character.

We can change it to something like:

The identifier of the release. The id value must be unique within the scope of the contracting process identified by the ocid value, for a given version of OCDS. In other words, a publisher may publish datasets for different versions of OCDS, and repeat releases within each dataset. The id value must not contain the # character.

For the merging routine, I think we can go even simpler, and simply specify that, "All releases must be upgraded to the same version of OCDS." (Merging releases from different versions has undefined behavior.)

@ColinMaudry
Copy link
Member

ColinMaudry commented Jan 19, 2021

I would specify "major version". Publishing different datasets for different minor versions of the same major version (e.g 1.1 and 1.2 data) is not recommended, even in distinct datasets. Could we add a paragraph about that in the Build or Publish guidance?

Merging releases from different versions has undefined behavior.

Can a releases with different minor versions merge? If a field has been renamed between major versions, how can it be merged? Is the result valid and useful in any way? In other terms, shouldn't it be advised against in both cases?

@jpmckinney
Copy link
Member Author

jpmckinney commented Jan 20, 2021

To clarify, merging releases from different major or minor versions has undefined behavior (i.e. not advised). For example, in OCDS 1.0, organizations were in-lined. There was no parties array, and there were no organization references. Organizations had full details under buyer, awards/suppliers, etc. Merging 1.0 and 1.1 data would yield strange results. For example, if a buyer's address is updated in the parties array in the 1.1 data, the old address would still appear under buyer from the 1.0 data. Therefore, data must be upgraded to the same minor version.

Publishing different datasets for different minor versions of the same major version (e.g 1.1 and 1.2 data) is not recommended, even in distinct datasets.

Why is this a problem? Minor versions introduce changes that old software might not be able to handle (e.g. deprecating fields and moving organization details to the parties array, like in 1.1). It would be very frustrating to users if a publisher turned off 1.1 data the day they published 1.2 data.

@ColinMaudry
Copy link
Member

Minor versions introduce changes that old software might not be able to handle (e.g. deprecating fields and moving organization details to the parties array, like in 1.1). It would be very frustrating to users if a publisher turned off 1.1 data the day they published 1.2 data.

Right! I mixed up the use cases.

Then there is the description of date. I think the current one could remain as-is since it's pretty clear that the date corresponds to the first time the information was released. I can't think of a way it could imply a version upgrade would affect the date of the release. However, we can make it crystal clear:

The date this information was first released, or published. Upgrading the release to a newer OCDS version should not affect the value of this field.

@jpmckinney
Copy link
Member Author

I think #854 (PR #995) clarified date enough for this issue's purposes.

@ColinMaudry
Copy link
Member

Ah I had missed it, thanks!

@ColinMaudry
Copy link
Member

ColinMaudry commented Feb 3, 2021

In that case, I think we can proceed with this description for id:

The identifier of the release. The id value must be unique within the scope of the contracting process identified by the ocid value, for a given version of OCDS. In other words, a publisher may publish datasets for different versions of OCDS, and repeat releases within each dataset. The id value must not contain the # character.

@jpmckinney
Copy link
Member Author

Sounds good. I might edit my earlier suggestion, to reference field titles instead of field names, and to avoid the use of Markdown where possible.

The identifier of the release. The release ID must be unique within the scope of the contracting process identified by the Open Contracting ID (ocid), for a given version of OCDS. In other words, a publisher may publish datasets for different versions of OCDS, and repeat releases within each dataset. The release ID must not contain the number sign (#).

@ColinMaudry
Copy link
Member

ColinMaudry commented Mar 8, 2021

@jpmckinney Do we mention the possibility to publish several datasets for different OCDS versions in existing guidance pages (identifiers, easy releases, etc.) or do we only add it to the new version upgrade guidance content #1217?

@jpmckinney
Copy link
Member Author

Let's just do it in #1217. It's going to be an edge case, so it's okay for the content to be on one page only.

@ColinMaudry
Copy link
Member

ColinMaudry commented Mar 9, 2021

The merge routine would have to specify that, if two releases have the same id, then they should be treated as duplicates and one should be ignored (presumably the one whose version matches the fewest other releases). (#849 (comment))

This comment assumes that the OCDS version is indicated at release level (may be implemented in #426), but currently it's only available at package level.

The merge routine documentation doesn't mention packages and OCDS version. It's only about merging a number of releases into a combined or versioned release. We assume the releases come from a single or multiple release packages (with a potentially different OCDS version), but it looks like this aspect is out of the scope of this chapter.

One change (in bold) we could do is change this sentence (first section of the chapter):

Merging is the process of combining individual releases with the same OCDS version into a compiled or versioned release, described in more detail below. At a high level:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Focus - Documentation Includes corrections, clarifications, new guidance, and UI/UX issues Focus - Merging Relating to the creating of compiled releases and versioned releases Semantics Relating to field and code descriptions
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants