Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROCESS] Proposal for a new process roll out protocol changes via RFCs #2594

Closed
tdas opened this issue Feb 1, 2024 · 5 comments
Closed

Comments

@tdas
Copy link
Contributor

tdas commented Feb 1, 2024

Motivation

Whenever we design and propose a protocol change, (e.g., a new table feature), we go through a preview / experimental phase during which the details of the protocol can undergo changes. Until the protocol change is completely tested and finalized, it cannot be guaranteed that the tables created with the preview feature will be forward compatible with any Delta reader/writer. However, today, this "preview phase" is not properly communicated as we directly merge any new proposal into the final protocol doc without proper annotation about its "preview" nature. This has led to a number of issues in the past. For example: CheckpointV2 was released in Delta 3.0, and immediately after that it was reported that certain details of the implemented protocol are different from the documented protocol. We had to update the protocol doc after the implementation and tables generated by that implementation were released in the wild.

New RFC Process Proposal

To avoid confusions like this, let us have a RFC process that clearly separates features in preview from finalized features with future compatibility guarantees. The proposed process is as follows.

RFC structure

  • Location: All preview protocol changes will be located in a new directory <delta_repo_root>/protocol_rfcs/ . Contents of the directory will be as follows:

  • In the directory, each proposed change will be stored in a markdown file named <table_feature_name>.md (or some other meaningful name that makes it easy to identify).

  • The directory will have index.md that will contain the list of current and past RFCs along with their status - opened on <date>, accepted on <date>, rejected on <date>.

  • Template of an RFC doc: Each doc must be in markdown format. Contents of the markdown: 

  • Title: Name of the protocol change / table feature

  • Link to the Github issue with additional context and discussions

  • New or existing sections of the final protocol doc that need to be modified 

    • For existing sections, the changed content must be made clear (e.g., new content highlighted in bold/italics, removed content u strikethrough).

Steps of rolling out a protocol change

1. Make initial proposal

Create a Github issue of type [Protocol Change Request].

  • This will be a new type of issue template (see the 2 existing ones) which will define these steps in it.
  • The description of the issue may have links to design docs, etc.
  • This issue will serve as the central location for all discussions related to the protocol change.

If the proposal comes with a prototype or other pathfinding, the changes should be in an open PR.

2. Add the RFC doc

After creating the issue and discussing with the community, if a basic consensus is reached that this feature should be implemented, then create a PR to add the protocol RFC before merging code in master.

  • Clone the RFC template above and create a new RFC markdown doc.
  • Cross-link with the issue with "see #xxx". DONT USE "closes #xxx" or "fixes #xxx" or "resolves #xxx" because we don't want the issue to be closed when this RFC PR is merged.

Changes related to the feature should only merge after the RFC is accepted. In addition, for table features, it is strongly recommended that any experimental support for the feature uses a temporary feature name with a suffix like -dev. This will communicate to the users that are about to use experimental feature with no future compatibility guarantee.

3. Finally, accept RFC into the protocol

For a RFC to be accepted, it must satisfy the following criteria:

  • There is a production implementation (for example, in delta-spark) of the feature that has been thoroughly well tested.
  • There is at least some discussion and/or prototype that ensure the feasibility of the feature in Delta Kernel

When the success criteria are met, then the protocol can be finalized by making a PR to make the following changes:

  • Closely validate that the protocol spec changes are actually consistent with the production implementation.
  • Cross-link the PR with the original issue with "closes #xxx" as now we are ready to close the issue. 
  • Update protocol.md 
  • Move the RFC doc to the accepted subdirectory, and update the state in index.md
  • Remove the temporary/preview suffix like -dev in the table feature name from all the code.

3'. Alternative: reject the RFC

If the RFC is to be rejected, then make a PR to do the following changes:

  • Moved the RFC doc to the rejected subdirectory
  • Update the state in index.md
  • Remove any experimental/preview code related to the feature.

Next steps to formalize this process

Please leave your feedback and questions as comments. I will keep updating the proposal based on them. Thanks in advance.

@tdas tdas linked a pull request Feb 3, 2024 that will close this issue
5 tasks
@tdas tdas removed a link to a pull request Feb 3, 2024
5 tasks
@ryan-johnson-databricks
Copy link
Collaborator

I will add create the RFC directory and add the RFC template

Should we link to that PR here, e.g.

#2601 will create the RFC directory and add the RFC template

@roeap
Copy link

roeap commented Feb 5, 2024

@tdas @ryan-johnson-databricks - great initiative coming up with a more formal process how features make it into the protocol as well as the implementations - this will strengthen the community quite a bit I believe.

Given that the Kernel implementations are still in a raw state, I think it is good to require "just" some implementation and a feasibility discussion on a kernel implementation. However going forward since the kernels may be considered reference implementations of the protocol, would it maybe make sense that there is an implementation in at least one kernel or that there is at least a committment form someone to add it such that the community (of connectors) can immediately start to benefit once a feature has landed?

@ryan-johnson-databricks
Copy link
Collaborator

ryan-johnson-databricks commented Feb 5, 2024

That's a good point. And yes, once the kernels have stabilized (so that adding a new feature doesn't require adding missing pieces of kernel implementation first), it does seem like RFC acceptance should be contingent on a working kernel implementation of the feature.

@tdas
Copy link
Contributor Author

tdas commented Feb 6, 2024 via email

tdas added a commit that referenced this issue Feb 7, 2024
@tdas
Copy link
Contributor Author

tdas commented Mar 26, 2024

This process is now enforced. Closing this issue.

@tdas tdas closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants