Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Third draft for ZEP(Zarr Enhancement Proposal), a community feedback process for Zarr #16

Merged
merged 14 commits into from
May 4, 2022

Conversation

MSanKeys963
Copy link
Member

Hi everyone. πŸ™‹πŸ»β€β™‚οΈ

Following the discussions from #14 I've been working on ZEP(Zarr Enhancement Proposal), a community feedback process for the Zarr community.

The purpose of having something like ZEP in place is to have a well-organised and productive discussion around Zarr for adding features, introducing new processes, informing the community about a new change and other important things which concerns Zarr.

Please go through:

and let me know your thoughts!

I'd really appreciate constructive feedback on the first draft so that we can move forward with this.

The next steps would look something like this:

  • Taking into account the feedback from the community for the first draft and finalize the process as soon as possible
  • Creating a new repository mostly by the name: zep
  • Creating or importing infrastructure to process incoming new ZEPs so that they can be rendered and displayed on Zarr's website

Please let me know if there's anything that is not clear or doesn't make sense. Please forward/tag folks to this conversation if I missed someone.

Thanks!

Cc: @joshmoore @jakirkham @alimanfoo @rabernat @ryan-williams @martindurant @WardF @grlee77 @jbms @DennisHeimbigner @ivirshup

@MSanKeys963 MSanKeys963 changed the title First draft for ZEP(Zarr Enhancement Proposal) community feedback process for Zarr First draft for ZEP(Zarr Enhancement Proposal), a community feedback process for Zarr Mar 14, 2022
@jbms
Copy link

jbms commented Mar 14, 2022

I noticed that the instructions indicate that discussions should happen in the zarr-python repository --- is that correct, or should they instead happen in e.g. the zarr-specs repository or the zep repository? I imagine there are others like myself who would be interested in monitoring spec-related discussions but not all zarr-python-related discussions.

Also in regards to a reference implementation, perhaps the instructions should clarify that it should be in zarr-python if that is the intention.

@MSanKeys963
Copy link
Member Author

I noticed that the instructions indicate that discussions should happen in the zarr-python repository --- is that correct, or should they instead happen in e.g. the zarr-specs repository or the zep repository? I imagine there are others like myself who would be interested in monitoring spec-related discussions but not all zarr-python-related discussions.

That's a good catch and I think discussions should happen in the zep repository rather than zarr-python. I guess I had zarr-python in mind since @joshmoore recently enabled it.

Also in regards to a reference implementation, perhaps the instructions should clarify that it should be in zarr-python if that is the intention.

Yes, makes sense. I'll take care of this.

Thanks a lot for the feedback @jbms.

@jbms
Copy link

jbms commented Mar 14, 2022

I think the general content required by the instructions and process sounds great.

If a more lightweight process is desired, one thing to consider would be to make a PR to the zarr-specs repository the ZEP itself. Then discussion could happen on the PR itself, rather than a separate discussion thread, and it would be accepted by merging the PR. The zarr-specs repository could contain supporting documentation for various features that might include rationale, backwards compatibility, code examples, etc. --- the material that would otherwise be in the ZEP itself. That way someone can just read the current zarr specification itself to find out the relevant information, instead of having to hunt around for it in various ZEPs.

With Python it is often the case that the only/canonical documentation for a feature is the PEP itself, which I think is not ideal since that can make it tricky to figure out what the current accepted state of things is, and how a given accepted PEP may have been superseded by a later PEP.

Alternatively, I can see how the additional step of the ZEP may be useful for allowing reviewers to have a single document outlining the changes and their impact rather than looking a diff to the standard. But I think it would still be very valuable to ensure as part of the process for accepted ZEPs that the zarr spec itself is updated with any relevant information, so that the ZEP repository itself is just an archive of past proposals but doesn't need to be referred to to understand the current spec.

For example, instead of saying that the "Process for XYZ" will be called ZEP-27, and future changes to "Process for XYZ" will be made as modifications to ZEP 27, instead ZEP-27 would add a "process-for-xyz.md" document in the zarr specs repository, and future changes would be made as e.g. ZEP-35 which modifies process-for-xyz.md.

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts after a first reading.

ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
@MSanKeys963
Copy link
Member Author

I think the general content required by the instructions and process sounds great.

Thank you.

If a more lightweight process is desired, one thing to consider would be to make a PR to the zarr-specs repository the ZEP itself. Then discussion could happen on the PR itself, rather than a separate discussion thread, and it would be accepted by merging the PR. The zarr-specs repository could contain supporting documentation for various features that might include rationale, backwards compatibility, code examples, etc. --- the material that would otherwise be in the ZEP itself. That way someone can just read the current zarr specification itself to find out the relevant information, instead of having to hunt around for it in various ZEPs.

I think having something like ZEP in place is to ensure that there are no discussions happening in the Zarr community that are open-ended. I see there are some PRs in the spec repo which still hasn't reached a conclusion and I believe having ZEP in place could help prevent that from happening.

Also, this is what I think would be best for Zarr but I'm happy to see what others think about having a lightweight process or having ZEP in place for spec changes.

But I think it would still be very valuable to ensure as part of the process for accepted ZEPs that the zarr spec itself is updated with any relevant information, so that the ZEP repository itself is just an archive of past proposals but doesn't need to be referred to to understand the current spec.

Yes, I think updating the spec alongside an accepted ZEP would go long way in building a structured guide for new contributors as well. I think I could add a few lines to the instructions files mentioning this.

For example, instead of saying that the "Process for XYZ" will be called ZEP-27, and future changes to "Process for XYZ" will be made as modifications to ZEP 27, instead ZEP-27 would add a "process-for-xyz.md" document in the zarr specs repository, and future changes would be made as e.g. ZEP-35 which modifies process-for-xyz.md.

I think this is how the ZEP would work when making changes to the spec. The ZEP -27/ZEP-35 would have a textual record of what changes the contributor would be making to spec files(process-for-xyz.md).

Also, there's one more thing I'd like to mention here is that if the changes that are being made in ZEP-35 are very small it could be done via updating ZEP-27 instead of creating ZEP-35 as creating a new ZEP involves a good amount of work from the contributor's side.

@jbms
Copy link

jbms commented Mar 15, 2022

As far as making a small change to an existing ZEP --- what I was suggesting is that the ZEP be solely an archive of past proposals and their resolutions, but any authoritative information be contained in the specification itself, organized by logical topic (like "process-for-xyz") rather than by ZEP number (i.e. the order in which a given change was proposed). Under that view, it doesn't really make sense to modify a ZEP after it is finalized, except perhaps for minor editorial corrections, since then it would no longer serve as an archive. I suppose if the intent is to make a minor change to "process-for-xyz" that does not require the full ZEP process, then that could just be done as a pull request that modifies "process-for-xyz".

@MSanKeys963
Copy link
Member Author

what I was suggesting is that the ZEP be solely an archive of past proposals and their resolutions, but any authoritative information be contained in the specification itself

Yes.

organized by logical topic (like "process-for-xyz") rather than by ZEP number (i.e. the order in which a given change was proposed).

I think it makes much more sense to organise by the ZEP's date rather than a logical topic.

Under that view, it doesn't really make sense to modify a ZEP after it is finalized, except perhaps for minor editorial corrections, since then it would no longer serve as an archive. I suppose if the intent is to make a minor change to "process-for-xyz" that does not require the full ZEP process, then that could just be done as a pull request that modifies "process-for-xyz".

Yes, makes sense. But I also think updating the ZEP(could be as simple as adding 1-2 lines) even if it's a minor change would go a long way in having a well-maintained archive.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am proposing some changes based on what we discussed at yesterday's SC meeting. The gist of my suggestions is the following: Zarr is different from numpy or python in that there exists an explicit specification, separate from the implementation. I am proposing that "standard track ZEP" be replaced with "specification ZEP", and we explicitly exclude implementation changes from the ZEP process. Only things that touch the spec require a ZEP. This will help constrain the scope of the ZEP process.

My other suggestion is that we do not have a separate repo for ZEP and spec. This follows logically from the previous suggestion. If we accept that ZEPs are specifically about the spec, then we can save ourselves some trouble of maintaining a separate repo and just keep everything in one place.

What do people think about this idea.

ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose we modify the acceptance criteria in the following way.

These conditions must be met for the ZEP to be accepted

  • Unanimous approval of the Zarr Steering Committee
  • Majority approval of the Zarr Implementations Committee. Approval indicates that that implementation plans to implement the new spec features. Not all implementations must implement all features, but a majority is required.
  • No vetos from the Zarr Implementations Committee. Each implementation has the right to veto a ZEP if it would cause severe problems for that implementation.

This will require us to create an implementations committee.

@normanrz
Copy link
Member

normanrz commented Apr 8, 2022

I would propose we modify the acceptance criteria in the following way.

These conditions must be met for the ZEP to be accepted

  • Unanimous approval of the Zarr Steering Committee
  • Majority approval of the Zarr Implementations Committee. Approval indicates that that implementation plans to implement the new spec features. Not all implementations must implement all features, but a majority is required.
  • No vetos from the Zarr Implementations Committee. Each implementation has the right to veto a ZEP if it would cause severe problems for that implementation.

This will require us to create an implementations committee.

I think this is fantastic. In Wednesday's community call we discussed to establish 2 kinds of spec ZEPs: core and extensions. For core ZEPs, all implementors need to consent with a commitment to implement. For extension ZEPs, the approval bar would be lower such as "sounds reasonable".

@rabernat
Copy link
Contributor

rabernat commented Apr 8, 2022

Perhaps we could also add that it's the Steering Council's responsibility to assess the level of consensus from the broader community and vote accordingly.

@MSanKeys963 MSanKeys963 changed the title First draft for ZEP(Zarr Enhancement Proposal), a community feedback process for Zarr Second draft for ZEP(Zarr Enhancement Proposal), a community feedback process for Zarr Apr 11, 2022
@MSanKeys963
Copy link
Member Author

After taking all the feedback into the account, I've finally managed to push the second draft for ZEP. The changes made in the second draft are:

  • Expanding the categories of ZEPs to include specification changes
  • Adding a ZEP category that involves changes to the core specification protocol and defining the governance process around those
  • Adding a ZEP category that involves the changes to the extension specification protocol and defining a lenient governance process around those
  • Redefining the governance process to handle the less-interesting ZEP where the community is less involved as to prevent it from being accepted because of the less participation

I'd highly appreciate feedback on this and would very much like to know your thoughts.

Cc: @joshmoore @jakirkham @alimanfoo @rabernat @ryan-williams @martindurant @WardF @grlee77 @jbms @DennisHeimbigner @ivirshup @normanrz

@MSanKeys963
Copy link
Member Author

Thanks a lot, @rabernat for going through the first draft and providing your suggestions. Here are my thoughts:

I am proposing that "standard track ZEP" be replaced with "specification ZEP", and we explicitly exclude implementation changes from the ZEP process. Only things that touch the spec require a ZEP. This will help constrain the scope of the ZEP process.

In my second draft, I've extended the categories of ZEP to include changes in the Zarr storage specification. So, now basically the ZEP covers all the major changes that'll be taking place in the Zarr ecosystem.

As for your suggestion of removal of Standard Track ZEPs, I think I'd like to keep them. I'm thinking of ZEP as a gateway for making major changes to the Zarr. The ZEP could also serve as a means of documenting those changes and also as a well-maintained archive for the same.

Also, I'm very much interested in hearing what others think of this.

My other suggestion is that we do not have a separate repo for ZEP and spec. This follows logically from the previous suggestion. If we accept that ZEPs are specifically about the spec, then we can save ourselves some trouble of maintaining a separate repo and just keep everything in one place.

My suggestion for having a separate repo for ZEP is to create a dev environment to render and display all the incoming ZEP on Zarr's website, something like https://zarr.dev/zep/. The other reason I see for having a separate repo is to maintain a versioned archive of all the changes happening in Zarr.

I would propose we modify the acceptance criteria in the following way.

These conditions must be met for the ZEP to be accepted

  • Unanimous approval of the Zarr Steering Committee
  • Majority approval of the Zarr Implementations Committee. Approval indicates that that implementation plans to implement the new spec features. Not all implementations must implement all features, but a majority is required.
  • No vetos from the Zarr Implementations Committee. Each implementation has the right to veto a ZEP if it would cause severe problems for that implementation.

This will require us to create an implementations committee.

I think this acceptance criterion fits perfect for ZEPs dealing with changes to core specification protocol. However, when thinking about extension ZEPs I think we need to think of more relaxed criteria.

I've defined the criteria for extension ZEPs here, let me know what you think.

@MSanKeys963
Copy link
Member Author

MSanKeys963 commented Apr 11, 2022

I think this is fantastic. In Wednesday's community call we discussed to establish 2 kinds of spec ZEPs: core and extensions. For core ZEPs, all implementors need to consent with a commitment to implement. For extension ZEPs, the approval bar would be lower such as "sounds reasonable".

Yes, @normanrz.
Please check this and this.

@rabernat
Copy link
Contributor

rabernat commented Apr 11, 2022

I'm thinking of ZEP as a gateway for making major changes to the Zarr

Here's what I don't understand about this: what even is "Zarr," besides the spec? Do you mean Zarr-python? If so, I think we should say so explicitly. And could you give an example of a "major change" that would not touch the spec?

My suggestion for having a separate repo for ZEP is to create a dev environment to render and display all the incoming ZEP on Zarr's website, something like https://zarr.dev/zep/. The other reason I see for having a separate repo is to maintain a versioned archive of all the changes happening in Zarr.

But we do already have - https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/ . My thinking was that if the ZEPs are only about the spec, then it would just be simpler to keep them together in one website. (I say this as a maintainer of too many sphinx sites, which do have a continuous maintenance burden.)

Can you clarify what you mean by "changes happening in Zarr", particularly changes that do not touch the spec? Clearly we do not have to have a ZEP for every PR in zarr-python. As noted above, I would love to hear an example of a major change that doesn't involve the spec. I can't think of one--In my head, everything else is by definition an implementation detail, no? But I am probably overlooking something.

I think it's very important to resolve the question of whether the ZEP process is about the zarr-python implementation or not. I know that this is considered the reference implementation. But I believe our project will ultimately benefit if we strive to treat zarr-python as just another implementation.

@martindurant
Copy link
Member

Can you clarify what you mean by "changes happening in Zarr", particularly changes that do not touch the spec?

Perhaps I can give contrasting examples of two features I would like for zarr python (and any other language that wants to), the first of which does not touch the spec and the second of which does. At least, this is how I understand the difference.

  • the storage layer currently fetches (a list of) whole keys for a given user-requested data slice, and then fills the appropriate array. I would like to optionally pass the actual array chunk size and requested part to the storage layer, so that at the edges of the request, we can fetch partial chunks, if the storage backend allows it. This is of interest to uncompressed chunks or block-compression, and something like this is implemented for the specific case of blosc (not documented!).
  • zarr currently requires the chunks to be regular and represented in the metadata by a single integer in each dimension. For the purposes of both kerchunk (where we do not control chunking) and awkward (where we have variable length batches), we require the chunks to be a list in each dimension, like dask.array. I would assume that any implementation of this would still allow the previous integers, so that the spec would need (int|list(int)) for the elements of the chunks field.

@normanrz
Copy link
Member

I agree with @rabernat that ZEPs are most useful for spec changes. Changes to implementations are better discussed in their repos with issues and PRs. I think that includes the interoperability of a Zarr implementation with other libraries in that language ecosystem.

In that spirit, the first example of @martindurant seems to me like an implementation detail of zarr-python (and other potentially languages). Other implementations may not even have the same layers of abstraction.
The second example is definitely a spec change that would need a ZEP.

@rabernat
Copy link
Contributor

I agree with Norman's characterization of the Martin's examples: the first is a zarr-python implementation issue and would (in my mind) not rise to the level of a ZEP. The second is definitely ZEP material.

I'm curious what @zarr-developers/steering-council think about this. I'm not 100% married to the proposal to restrict the ZEP process to the spec, but I do think it would be a useful simplifying principle.

@joshmoore
Copy link
Member

joshmoore commented Apr 12, 2022

#16 (review) Unanimous approval of the Zarr Steering Committee

I could see loosening this to either no veto or only in the case of tie, etc. modulo @rabernat's follow-on:

Perhaps we could also add that it's the Steering Council's responsibility to assess the level of consensus from the broader community and vote accordingly.


#16 (comment) 2 kinds of spec ZEPs: core and extensions.

Thinking of adding extensions as opposed to the changing the extension mechanism, the hope of extensions according to @alimanfoo was to not get in anyone's way who wanted to extend. So one question I have is what extensions ZEP would include? Perhaps submitting extensions for using the common namespace? Or for support across the implementations?


On some of the more "controversial" topics (:wink:):

  • "Standard Track": As long as the naming is such that we can expand the categories in the future without confusion, I'd personally err on the side of having less in the ZEP (fewer types that we try to manage, etc.) and then growing as necessary.
  • "ZEP" repo: a few things

@rabernat
Copy link
Contributor

Let's aim to finalize and merge this today. We can discuss any final points of contention at today's SC meeting.

ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

Small note: Could we wrap the lines here to something like 72 characters or similar? Some of the lines get a little long so suggestions are more complicated to make

@MSanKeys963 MSanKeys963 changed the title Second draft for ZEP(Zarr Enhancement Proposal), a community feedback process for Zarr Third draft for ZEP(Zarr Enhancement Proposal), a community feedback process for Zarr Apr 28, 2022
@MSanKeys963
Copy link
Member Author

Hi again. πŸ‘‹πŸ»

After taking the feedback from the previous steering council and community call, I've pushed the following changes:

  • Removal of Standard Tracks ZEPs
  • Stripping of Protocol Extension ZEPs (Extension ZEPs) as discussed during the last community call
  • Redefining the acceptance process for Specification ZEPs which would include involvement from the newly formed Implementations Council
  • Some minor fixes

Now, the scope of ZEP is mainly focused on changes made in Zarr's core storage specification and tools/environment used in its development, defining new processes, changes to the decision-making process and providing general guidelines/information to the Zarr community.

I'd highly appreciate feedback on this and would very much like to know your thoughts.

Cc: @joshmoore @jakirkham @alimanfoo @rabernat @ryan-williams @martindurant @WardF @grlee77 @jbms @DennisHeimbigner @ivirshup @normanrz

@MSanKeys963
Copy link
Member Author

I've resolved all the open conversations and made a few changes wherever it was required. Please have a final read and let me know if there's anything.

Cc: @joshmoore @jakirkham @alimanfoo @rabernat @ryan-williams @martindurant @WardF @grlee77 @jbms @DennisHeimbigner @ivirshup @normanrz @jstriebel

@joshmoore
Copy link
Member

Pushed a reflowing of the document. A few additional suggestions which we could certainly be handled in future PRs:

  • There are a few duplicated sections, e.g. "gitter OR issue OR discussion". I would move these to their own section and then use an internal reference.
  • If there's an abbreviate for one type of zep (SPEC ZEP), I'd have one for each (CORE, INFO, & PROC ZEPs).

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I have a few more minor comments.

ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
ZEP/instructions/zep0000.md Outdated Show resolved Hide resolved
joshmoore and others added 5 commits April 29, 2022 17:40
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
@joshmoore joshmoore merged commit 7a4e545 into zarr-developers:master May 4, 2022
@joshmoore
Copy link
Member

Merged together on the community call yesterday evening. πŸŽ‰ Minor & cosmetic modifications will take place directly, but if there are any semantic changes to ZEP0 a subsequent PR will be raised for discussion. Thanks everyone!

@MSanKeys963
Copy link
Member Author

Thanks a lot, everyone for their input. Really appreciate it. ❀️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants