Skip to content

Conversation

@pingtimeout
Copy link
Contributor

@pingtimeout pingtimeout commented Jul 22, 2025

Description

This PR automates most of the commands listed in the release guide. It fixes #2017. See the README.md files for information on how the code is structured and how it can be used. The code includes:

  • Machine configuration for releasing
  • RC branch and tag creation
  • Polaris build
  • Integration tests (only the non-cloud IT)
  • Packages upload to dist.apache.org and Nexus

Parts not automated:

  • Run the cloud-specific integration tests
  • Close the staging repository
  • Anything starting at "Start the vote thread", which is about sending e-mails

This is a first implementation that aims at being a 1:1 mapping with the current release guide.

Note that it includes the instructions from #2039 that has not been merged yet, at the time of opening this PR, given that the high likelihood of that PR being merged before this one.

Hints for code review

Recommendation for reviewers: start with the libs/ folder to get an idea of all utility functions, then look at the test/ folder to see what each script is expected to do, then review the scripts themselves.

How to test

By default, the scripts run in dry-run mode. This is intended in order to avoid accidentally performing releases during testing.

All the other parameters used by the scripts can be overridden using environment variables. For example,

  • if you want to create release branch and tags on your own Git fork, say pingtimeout, run export APACHE_REMOTE_NAME=pingtimeout APACHE_REMOTE_URL=git@github.com:pingtimeout/polaris.git
  • if your GPG key is not in https://downloads.apache.org/incubator/polaris/KEYS, create a KEYS file locally with your key, run a small http server (e.g. python -m http.server) from your laptop to serve it, and set export KEYS_URL=http://localhost:8000/KEYS
  • Etc...

Then, imply run each script in order. The output will contain messages that describe the commands that would be executed, if dry-run mode was disabled. See example below:

Capture d’écran 2025-08-01 à 10 26 37

How to view all commands that would be executed

Each script in the test/ directory verifies the commands that the associated release script would execute, if it was not in dry-run mode. See for example test-02-create-release-branch.sh#L87-L103 for the part that creates the release branch and updates the version files.

@pingtimeout pingtimeout marked this pull request as ready for review July 22, 2025 17:18
Print usage information.

Examples:
$(basename "$0") --version 1.0.0-incubating-rc1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to split version and rc numbers and make those separate parameters.
-incubating must be present, so it can be a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered pulling that part from your PR, indeed. But decided to not reuse it so that the code is overall simpler. The current approach is less smart, as it pushes the responsibility of adding -incubating to the release manager and enforces it via a regex validation.

The alternative would be:

  • Have a --version x.y.z parameter that accepts x.y.z as well as x.y.z-incubating
  • Have a --rc N parameter
  • Have a logic that detects whether -incubating is already present and adds it, if missing

Overall, I would rather have a single --version x.y.z-incubating-rcN parameter validated by a regex. Simpler code.

The only downside, really, is that when we graduate, we will have to update multiple shell scripts to remove the -incubating part. I feel that it is acceptable.

@pingtimeout pingtimeout changed the title [WIP] Automate the release guide Automate the release guide Jul 23, 2025
@flyrain
Copy link
Contributor

flyrain commented Aug 4, 2025

Thanks @pingtimeout , will take a look today!

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pingtimeout for doing this. LGTM overall with a few comments.


print_info "Updating CHANGELOG.md"
exec_process cd "${releasey_dir}/.."
exec_process ./gradlew patchChangelog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script exists at line 139 in case of RC2+. Should we re-run the changelog for any RC2+?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I am not 100% sure about the answer, to be honest. The release guide says that the changelog should be manually updated:

Manually add an -rcN suffix to the previously generated versioned CHANGELOG section.
Rerun the patchChangelog command
Manually remove RC sections from the CHANGELOG
Submit a PR to propagate CHANGELOG updates from the release branch to main.

And IIRC there are still some discussions on changelog updates, whether those updates should be merged to main, etc...

This is the reason why the script exits for RC>1.

My goal with this PR is to have a 1:1 automation of what the release guide says, so that we can start having some semi-automated (albeit imperfect) release process for the August release. But I suspect we will iterate on this as we learn more about our practices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to improve it later. cc @dimas-b and @snazy, who has more experience with changelog

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR CHANGELOG.md has to be (re)generated for every version/RC/branch independently.

Content of CHANGELOG.md is maintained manually w/ the PRs being merged, it's the contributor's/reviewer's who update CHANGELOG.md. So changes to that file already exist and nothing has to be "merged back", and in fact cannot as the contents on "all the branches" can be different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, if nothing has to be "merged back", why do we need this command?

exec_process ./gradlew patchChangelog

Copy link
Contributor

@dimas-b dimas-b Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are CHANGELOG.md changes on main while the release is in progress, then automatically handling the release sub-sections in that file will likely lead to merge conflicts.

If we want to completely automate this process, I'd suggest to identify CHANGELOG entries with PR numbers and make custom tooling for generating release sub-sections. The tool can use the PR number IDs for reconciling main and release branches... not sure if there's appetite for this 😅 My original idea was to leverage the existing gradle plugin for that, but it apparently does not fit the multi-branch use case.

Comment on lines 105 to 116
# If we're creating RC2 or later, verify previous RC tag exists
if [[ ${rc_number} -gt 1 ]]; then
previous_rc=$((rc_number - 1))
previous_tag="apache-polaris-${version%-rc*}-rc${previous_rc}"

print_info "Checking for previous RC tag: ${previous_tag}"
if ! git tag -l "${previous_tag}" | grep -q "^${previous_tag}$"; then
print_error "Previous RC tag ${previous_tag} does not exist. Cannot create RC${rc_number} without RC${previous_rc}."
exit 1
fi
print_info "Previous RC tag ${previous_tag} found"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Do we have to care if the previous tag exists or not? what if the previous RC tag has been deleted manually already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a concern. My rationale is that no RC tag should ever be deleted. I don't think it is allowed per ASF principles, as it means a release candidate, for which a vote was called, could disappear.

Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t see anything in the ASF principles that explicitly calls that out—let me know if you have a link. If it’s not covered in ASF policy, the main reason to keep RC tags would just be for historical reference. That said, folks might still delete them to keep the tag list clean.

In any case, this isn’t a blocker. My thinking is that if the behavior isn’t defined, we don’t need previous tag checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting tags is not good practice and should really never happen, it breaks things.
Tags are referred from released artifacts/poms, that alone actually forbids deleting those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack that this is not a blocker 👍

Let's rediscuss this when JB is back from holidays. I think that's a fair question.

print_info "Updating Helm index..."
exec_process cd "${dist_dev_dir}/helm-chart"
exec_process helm repo index .
exec_process cd "${releasey_dir}/.."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to commit change after helm indexing, otherwise, the index file is not updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me try a few things locally and update the script.

print_info "- Maven artifacts: Published to Apache staging repository"
echo
print_info "Next steps:"
print_info "1. Close the staging repository in Nexus"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to automate this part? Otherwise, we may call it out here, that release manager has to sign in the Nexus website with their ASF credential to close it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a way to automate that, indeed. The grand vision is to have a fully automated release process that runs in Github Workflows. However there are some things that needs discussed before this can happen. Things like :

  • Should we trigger when a Git tag is created or when an empty Github Release with a specific tag pattern is created, ...
  • As you pointed out in another comment, should we (can we?) automatically update the Changelog?
  • Should we implement fixes on branches of the supported release(s) first and then merge those branches up to main (in which case the changelog updates are easy)? Or do we want to cherry-pick commits to add certain fixes to release branches (in which case it requires manual updates)?
  • Does the ASF supports automatically sending announcement/vote/cancel e-mails? In which case this could be embedded as part of the release automation
  • Etc...

As you can see, we are not quite there yet. I suggest we start with the current scripts, which provide a semi-automated model. And in parallel, we can start discussion to achieve the big goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to automate it later. Can we provide a bit more detailed message how to close it? Just in case people get lost.

Suggested change
print_info "1. Close the staging repository in Nexus"
print_info "1. close the staging repository in Nexus manually per instructions in https://polaris.apache.org/community/release-guide/ "

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to already start with semi-automatic releases. The required GH workflows require the scripts to be different, leading to two different implementations that have to be both verified and kept in sync, which is quite a big effort.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the amount of comments.

The TL;DR is that it's likely much easier overall to skip the "manual locally run scripts" and go straight to implement semi-automatic releases leveraging GitHub workflows.

There's a lot of stuff in these scripts that deals with verification, validation and dealing with local environment specific stuff. That's all not needed for GH workflows.

The amount of differences between Mac and Linux and the different tool versions plus the differences on everybody's opinionated local setup is quite big. Any change to these scripts has to be verified and validated on all these environments, not even thinking of a minor issue in the scripts that breaks a release.

Doing the "local scripts" first, then implement workflows, test both appropriately and keep both in sync all the time to eventually nuke the local scripts sounds like a huge amount of extra work, including the nitpickyness.

return 1
fi

major="${BASH_REMATCH[1]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder whether the global variables should be explicitly declared in a "lib", commenting where/how those are populated. WDYT?

fi

# Validate version format: x.y.z-incubating-rcN
# TODO: Remove incubating when we are a TLP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess the TODO can go away, or have it everywhere (which isn't great either).
Maybe have a constant VERSION_INCUBATING="-incubating", so there's only one place to change in that case.


print_info "Updating CHANGELOG.md"
exec_process cd "${releasey_dir}/.."
exec_process ./gradlew patchChangelog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR CHANGELOG.md has to be (re)generated for every version/RC/branch independently.

Content of CHANGELOG.md is maintained manually w/ the PRs being merged, it's the contributor's/reviewer's who update CHANGELOG.md. So changes to that file already exist and nothing has to be "merged back", and in fact cannot as the contents on "all the branches" can be different.

function check_docker_hub_login_status() {
# Check if user is logged in to Docker Hub (optional check)
print_info "Checking Docker Hub login status..."
if docker info 2>/dev/null | grep -q "Username:"; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - let's drop it.
How a login happens and particularly whether and how that appears in docker info is not standardized at all. It can actually be a very different login (different repo), so the check isn't that helpful.

print_info "Polaris version: ${polaris_version}"
print_info "From commit: ${commit}"
echo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a git fetch [remote] doesn't hurt here, just to be safe.

fi
}

function add_apache_remote() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this.
Having multiple remotes, especially those that point to the exact same repo, break developer experience in IDEs. Things appear twice for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can figure out the "right" remote name via git remote -v and looking for https://github.com/apache/polaris.git.

Copy link
Member

@RussellSpitzer RussellSpitzer Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there shouldn't be an "add remote" function. But i'd also note that the git -v output can also be a little tricky since if you don't use https
It would look like

apache	git@github.com:apache/polaris.git (fetch)
apache	git@github.com:apache/polaris.git (push)

I think you are safe if you just look for apache/polaris.git

return 0
else
print_error "GITHUB_TOKEN environment variable is not set"
print_error "A GitHub token is required to check CI status for releases"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically not required. The "anonymous" rate limit should be sufficient for a few runs of the script (Commit status checks). Setting up a GITHUB_TOKEN may lead to unnecessarily creating PATs or the like.
I suggest to change this to a warning.

Comment on lines +47 to +50
function get_gradle_signing_key_id() {
# Get the signing key ID from gradle.properties
grep '^signing.gnupg.keyName' "${GRADLE_PROPERTIES_FILE}" 2>/dev/null | cut -d'=' -f2
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this setting isn't technically required. It's rather the release manager's responsibility to ensure that the key is properly setup. Nothing prevents release managers to configure only one GPG key at all and/or set default-key in ~/.gnupg/gpg.conf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also forces a global setting, which can easily interfere with other stuff that people have to do with Gradle signing. So this actually adds burden to people who sign artifacts with Gradle for multiple projects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (locally executed) scripts, you can list the available secrets and explicitly ask the release manager to explicitly confirm that their settings are correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be an issue in GH workflows, where the workflows do the right thing anyways.

@pingtimeout
Copy link
Contributor Author

@flyrain @snazy I think there is some misunderstanding. I believe that only PMC members have write access to SVN distribution directory as well as permissions to create Nexus repositories. This makes testing and implementation tricky for me, for steps ≥05. Typically I can see this in the email template that is generated:

Convenience binary artifacts are staged on Nexus. The Maven repository URL is:
* NO STAGING REPOSITORY (java.lang.IllegalStateException: No staging repository with name apache created)

I can only rely on the information that is in the release guide, or on the hints I can find in the build file (like assuming that the ./gradlew initializeApacheStagingRepository command does the right thing for PMC members).

@snazy
Copy link
Member

snazy commented Aug 7, 2025

@flyrain @snazy I think there is some misunderstanding. I believe that only PMC members have write access to SVN distribution directory as well as permissions to create Nexus repositories. This makes testing and implementation tricky for me, for steps ≥05. Typically I can see this in the email template that is generated:

Convenience binary artifacts are staged on Nexus. The Maven repository URL is:
* NO STAGING REPOSITORY (java.lang.IllegalStateException: No staging repository with name apache created)

I can only rely on the information that is in the release guide, or on the hints I can find in the build file (like assuming that the ./gradlew initializeApacheStagingRepository command does the right thing for PMC members).

The message likely refers to the Nexus repo, not SVN?

@pingtimeout
Copy link
Contributor Author

@snazy This message says that I cannot test as many things as PMC members, and that I can only rely on the information in the current release guide. So if a PMC member requests changes that deviate from the release guide, I have to take the request at face value.

@pingtimeout
Copy link
Contributor Author

Just a heads up, I am giving a try at a Github-workflow based release automation. Hopefully this should reduce the amount of code, given that it will only be runnable by Github. If it fails, we can always come back to these scripts.

source "$LIBS_DIR/_log.sh"

function exec_process {
# See print_command for information on why fd 3 is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably just copy the note from print_ command to here since it's not exactly easy to jump between the two files

# Git Remote Setup Script
#
# This script ensures that the Git repository has the proper remote configuration for Apache Polaris releases:
# 1. Checks whether there is a Git remote named 'apache' pointing to https://github.com/apache/polaris.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably better to just work backwards from the assumption "we need the correct target" and not require the other conditions.

  1. Checks remotes
  2. If one doesn't exist with the correct URI for apache/polaris, error out
  3. If it does exist return the name of the remote which will be used


# Git/SVN repository constants
APACHE_REMOTE_NAME=${APACHE_REMOTE_NAME:-"apache"}
APACHE_REMOTE_URL=${APACHE_REMOTE_URL:-"https://github.com/apache/polaris.git"}
Copy link
Member

@RussellSpitzer RussellSpitzer Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only valid for HTTPS clones, a ssh clone would look like

git@github.com:apache/polaris.git

So it may be good to keep this just to "apache/polaris.git" since I believe we are only using this for validation?

#

#
# Git Remote Setup Script
Copy link
Member

@RussellSpitzer RussellSpitzer Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note on this doc, this script also has a bunch of other utility methods for checking other project specific statuses. I have no problem with the organization but we should probably change the doc here to note that this also contains other utility functions for working with git. I also wouldn't be opposed if you wanted to separate out the self executing bits from the utility bits but honestly I have no strong feelings on that.

curl -s -H \"Authorization: Bearer dummy_token\" -H \"Accept: application/vnd.github+json\" -H \"X-GitHub-Api-Version: 2022-11-28\" https://api.github.com/repos/apache/polaris/commits/$(git rev-parse HEAD)/check-runs | jq -r '[.check_runs[].conclusion | select(. != \"success\" and . != \"skipped\")] | length'"

# Compare content
if [[ "$actual_content" == "$expected_content" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could probably extract this out into a function since it's used in all the regressions if you like

@RussellSpitzer
Copy link
Member

Just a heads up, I am giving a try at a Github-workflow based release automation. Hopefully this should reduce the amount of code, given that it will only be runnable by Github. If it fails, we can always come back to these scripts.

That's pretty awesome too although I like the idea of having a set of scripts that could potentially work locally as well but I'd hate to put more work on you, so if it's simpler to just do the Github Workflow approach that seems very reasonable.

@pingtimeout
Copy link
Contributor Author

@RussellSpitzer thanks for the review !

I like the idea of having a set of scripts that could potentially work locally as well

My initial goal was that these scripts would be short-lived and replaced as soon as we had a robust release guide that paved the way for a Github workflow automation.

As Robert hinted, it is already quite complex to deal with Linux/MacOS differences (hello GNU sed vs BSD sed, etc...) as well as custom setups. I think that as soon as we have a set of Github Workflows for releases, we should delete the scripts and keep the codebase as small as possible.

I am not saying that nobody should ever be able to perform a release locally. What I am saying is that it will be rare enough that we should not be optimizing for it.

@flyrain
Copy link
Contributor

flyrain commented Aug 8, 2025

@flyrain @snazy I think there is some misunderstanding. I believe that only PMC members have write access to SVN distribution directory as well as permissions to create Nexus repositories. This makes testing and implementation tricky for me, for steps ≥05. Typically I can see this in the email template that is generated:

Convenience binary artifacts are staged on Nexus. The Maven repository URL is:
* NO STAGING REPOSITORY (java.lang.IllegalStateException: No staging repository with name apache created)

I can only rely on the information that is in the release guide, or on the hints I can find in the build file (like assuming that the ./gradlew initializeApacheStagingRepository command does the right thing for PMC members).

Committers should have the permission for both svn and Nexus usually. Could you check with ASF infra for this?

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 22, 2025
@pingtimeout
Copy link
Contributor Author

Closing this PR as it is replaced by #2383

@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Sep 22, 2025
@pingtimeout pingtimeout deleted the releasesv2-main branch September 22, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automate drafting, verifying, and publishing the release

7 participants