Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Installer/Upgrader use directory of sequenced sql files #302

Closed
jeffnm opened this issue Sep 27, 2017 · 24 comments
Closed

Make Installer/Upgrader use directory of sequenced sql files #302

jeffnm opened this issue Sep 27, 2017 · 24 comments

Comments

@jeffnm
Copy link
Contributor

jeffnm commented Sep 27, 2017

A quick way to address most of the needs expressed in #285 in the near future will be to make the installer/upgrader look for a directory of sql files and run them in sequential order.

Developer workflow would be to create a new sql file in the upcoming version directory, with a sequential naming scheme for each database change.

@veggiematts
Copy link
Contributor

veggiematts commented Sep 28, 2017

How do one determine the naming scheme for its own development ?

Look at the most recent PR and name the SQL file as "last PR number + 1" ? (does not guarantee that this last PR will be merged first)

Let a release manager rename the SQL files when preparing the release ? (assuming there is a release manager willing to do this job)

Not use a particular sequential naming scheme ? (assuming no development is based on top of another for the same release)

Other ideas ?

@jeffnm
Copy link
Contributor Author

jeffnm commented Sep 28, 2017

I think we either use time stamps in the name or we expect the merger to handle any conflicts that arise between opening the pull request and merging.

I think the least conflicted naming scheme would be 2.1.0-Timestamp.sql
I think the easiest for us to implement would be to use 2.1.0-LastMergedSQL+1.sql since that should be easy for the developer to determine by looking at their up-to-date repository and resolving that merge conflict will be easy for either the merger or the developer to do if a conflicted sequential update has been merged in first.

I think we should avoid having the release manager compile SQL files, except as conflict management edge cases. It's much better for developers if they can do it themselves because then they will be able to work with the most recent databases from other developers as well.

@jcuenod
Copy link
Contributor

jcuenod commented Sep 29, 2017

The problem with timestamps would be that the order two sql files are created is not necessarily the same as the order in which they should be run. I agree that LastMerged + 1 makes sense (I guess my caveat is that I'm not familiar with how many sql files are typically combined/constructed per release).

Would it make sense to make it LastMerged + 10 so that if another sql file is built that should run earlier is added late it can be lastMerged - 5?

@jcuenod
Copy link
Contributor

jcuenod commented Oct 2, 2017

I'm going to get to work on the assumption that sequenced sql files is agreed upon even if the naming scheme is not yet nailed down.

@fondrenlibrary
Copy link
Contributor

fondrenlibrary commented Oct 2, 2017

LastMerged (higher priority) followed by coder's local current timestamp (lower priority) could be an option.

@fondrenlibrary
Copy link
Contributor

And a little helper utility script could created to help coder or reviewer to generate and/or normalize file names .

@t4k
Copy link
Contributor

t4k commented Oct 2, 2017

I really don’t think LastMerged + 1 will be an issue. CORAL is not a big project. We generally know who is working on what. Also, the contributor that is merging pull requests can make sure that they check on any SQL file naming when they approve the PR. If something is off because a PR got behind some other one and has the same name, we just make sure that when we create our pull requests we use this option:

screenshot-github com-2017-10-02-16-18-53-617

This way maintainers of this project can make a quick change to the PR and commit it.

These conflicts should be fixed when merging the PR, not when making the release. By the time the release is ready, nothing should have to be adjusted by release managers. The development branch, while it is “development” should always be working code because we have managed what gets committed.

@veggiematts
Copy link
Contributor

I didn't think of the "Allow edits from maintainers" option in Github. I'm totally ok with adjusting the filenames at merge time if necessary.

@fondrenlibrary
Copy link
Contributor

if LastMerged + 1 is simply a math sum, given maintainer (Merger) can adjust file name right before merge, I think filename followed by current timestamp is good enough save the step to find LastMerged.

@jeffnm
Copy link
Contributor Author

jeffnm commented Oct 3, 2017

I think I'd prefer the developer to try to identify LastMerged themselves, because as @jcuenod pointed out, the timestamps don't necessarily line up to the order in which they should be applied to the database.

@fondrenlibrary
Copy link
Contributor

Last minute timestamp before merge should work fine. Ok with LastMerged also.

@fondrenlibrary
Copy link
Contributor

By the way, I think, timestamps should reflect execution order of SQL files otherwise confusion will be seen and make things over complicated.

@t4k
Copy link
Contributor

t4k commented Oct 6, 2017

I’m a little late to these last comments, but I would want to take the burden off of Reviewers as much as possible in the workflow, so that would mean not building in that adjustments to files are required upon merge.

An ideal workflow to me would be this:

  1. Developer needs to make a schema change.
  2. Developer looks at latest development code and determines the last schema number.
  3. Developer increments the schema number for their change and creates a pull request.
  4. Reviewer reviews the pull request.
  5. Reviewer looks at the schema number in the pull request and identifies that it directly follows the the schema number in the latest merged code on the development branch.
  6. Reviewer merges the PR without having to make any changes.

The only time Reviewers should need to intervene and make changes would be if a PR with a schema change is going to be merged in an order other than that of development.

@fondrenlibrary
Copy link
Contributor

@t4k 's step 5 is already a "burden" to reviewer and actually I think hands-on involvement of reviewer is inevitable but not always necessary. My updated thought is

  1. Since contributors might come from unknown (though our contributors are known well) whereas reviewers are defined, it's easier to ask reviewers to follow some steps to keep SQL files in sequence.

  2. Tell PR contributors no to touch existing SQL scripts but just create their owns and better off get time stamp in file names.

  3. Reviewers always asks PR contributors whether there is an accompanying SQL script though reviewer can find it by himself. If filename does not follow the format and reviewer asks the contributor to make the filename compliant.

  4. Or just similar to @t4k 's step 6, check the existing SQL script names, correspondingly change the filename by inserting the latest time stamp and merge.

@t4k
Copy link
Contributor

t4k commented Oct 6, 2017

I am not a fan of using timestamps at all. They can be cumbersome to create, especially for inexperienced contributors that we don’t want to discourage, and they add no real value.

Until we get automated tests working automatically, I hope no one is merging pull requests without hands-on involvement. Every pull request should be manually applied, tested, and read at this point to make sure there are no problems.

I don’t see how it is a burden for a reviewer to note the SQL file name while doing this review. Modifying the code and pushing something new up to be merged seems like much more work during a review and isn’t really in the spirit of a review, because modifications happen after reviewing. Would one have to make sure the installer/upgrader still works after the file name change?

@fondrenlibrary
Copy link
Contributor

At step 5, if disorder is found in the sequence number in a PR to be committed, who will fix it? Actually, I second the comment you made four days ago. Word "burden" is used here to hint reviewers' involvement may be inevitable in keeping sql files in order, according to my sense. Is LastMerged also a kind of timestamp?

@t4k
Copy link
Contributor

t4k commented Oct 6, 2017

Ah, yes, I see where I was confusing. Sorry. In my ideal workflow, I did not represent conflicts, because that is not ideal.

When there is a conflict, during step 5 the Reviewer would have to fix it by changing the incremental number on the filename.

I believe that changing an incremental number is much easier than generating a timestamp and then adding it to a filename. (That incremental number is what LastMerged represents, simply the last sequential number in the filename of the SQL files that represent schema updates, where LastMerged + 1 is what the developer will do when creating and naming a new file.)

In particularly complex scenarios, such as when there are multiple related PRs that have dependent schema changes, sequencing the filenames by hand is also easier than generating multiple timestamps at merge time.

And again, I don’t see this happening very often, because we don’t have that many schema changes. Also, because it doesn’t happen very often, the process should be as simple as possible because reviewers might have to refresh themselves on the procedures.

@fondrenlibrary
Copy link
Contributor

Bingo. Thanks @t4k.

@t4k
Copy link
Contributor

t4k commented Oct 27, 2017

I’ve created a PR with an SQL update (#319). The naming scheme I chose is like this:

…/install/protected/{semver.release.number}/{three_digit_sequential_number}-{github_issue_number}.sql

The file in this case is:
…/resources/install/protected/2.1.0/001-318.sql

Comments, improvements, and other suggestions welcome…

@jcuenod
Copy link
Contributor

jcuenod commented Oct 28, 2017

What is the reason for the protected folder? That's not a critique but not all the modules have used it and I'm just wondering why it's there.

@t4k
Copy link
Contributor

t4k commented Oct 30, 2017

The file structure is well before my time on this project, but my guess would be that they needed a folder with an .htaccess file in it to prevent anyone from accessing the SQL files.

Thinking about it, though, I’m not sure that there is anything problematic about anyone accessing the SQL files because as far as I know they can’t be run remotely by just accessing them…

Good question!

@fondrenlibrary
Copy link
Contributor

fondrenlibrary commented Nov 15, 2017

Don't want to re-spark discussions. Just FYI, Rails 5 still use simple timestamp to sequence migration scripts.
image

@t4k
Copy link
Contributor

t4k commented Jan 31, 2018

Note: #244 was merged has an SQL file that must be applied if using the development branch. There is no automatic detection of new SQL files to trigger a database update for that issue.

@jeffnm
Copy link
Contributor Author

jeffnm commented Mar 22, 2018

I think this can be closed as the sequenced files were all merged. We'll track fixing the version numbers that changed since then in #394.

@jeffnm jeffnm closed this as completed Mar 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants