Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Nov 2, 2018

Our document of doing release is out-dated. This PR updates the release doc to match the process of Spark 2.4.0 release.

  1. cutting RC has been fully automated. People should always use the docker script, instead of setting up the environment manually. Then we don't need to update document everytime an environment change is needed.
  2. when finalizing the release, suggest people to retain the generated docs of latest RC, and copy it to spark-website, instead of re-generating it.

Since it requires further manual steps, please also contact the <a href="mailto:private@spark.apache.org">PMC</a>.


<h4> Remove RC artifacts from repositories</h4>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the cleanup process to the end, since the docs are still needed when updating spark-website

@cloud-fan
Copy link
Contributor Author

# Copy the new documentation to Apache
# copy the docs of the voted RC to spark-website
$ svn co "https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs" spark-docs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you may have this info locally, how many files are in the current docs build?

Last time I tried to checkout a huge directory from svn, the ASF infra guys blocked our network from talking to the ASF servers. So I'd avoid this if that directory is kinda large.

I don't remember if I actually did that or just thought about it, but one idea was to leave the docs build locally after the rc is generated, instead of cleaning it up when the docker script is gone. That would allow you to just upload your local copy to spark-website.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2.4.0 it's 3572 files, 145 MB.

During a release, we upload these doc files many times(for each RC), and download them once. Will this be a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW during RC voting, if one people want to validate the doc, he has to check it out locally and start a server to look at the web pages. If the ASF infra blocks it, it will be hard for people to evaluate an RC.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of files. I think the thing the ASF doesn't like is a burst of requests, and this might trigger it. Uploading is not a problem, since it's authenticated. Downloading is not, so it could be a DoS, which is why they do the blocking.

if one people want to validate the doc, he has to check it out locally

Actually they can see it directly on the ASF server without downloading it locally. Using a browser and manually looking at the logs automatically "throttles" the number of requests, so it shouldn't trigger any warnings to the infra guys.

I think it's safer to recommend keeping the docs for the last RC locally in case it passes, and only checkout from svn if necessary.

<p>After the vote passes and you moved the approved RC to the release repository, you should delete
the RC directories from the staging repository. For example:</p>

<pre><code>svn rm https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-bin/ \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version string in the same should be consistent through out I think? shouldn't this be v2.4.0-rc5?

svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Update KEYS"
```

<h3>Installing docker</h3>
Copy link
Contributor

@jerryshao jerryshao Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add an another paragraph to do the release without docker. It is pretty similar to docker, just running the script do-release.sh is enough. The only difference is that user should manually install all the dependencies as listed in docker file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I try to avoid is, maintain a document about how to set up the environment/dependencies. We should control it fully through the scripts.

I think we should not recommend people to do release without docker. If someone does want to run do-release.sh directly, he should read the docker file and other related scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see.


<h3>Preparing gpg key</h3>

You can skip this section if you have been a release manager before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if you have been a release manager before" => "if you've already uploaded your key". No need to be a release manager before :).

# set JIRA_USERNAME, JIRA_PASSWORD, and GITHUB_API_TOKEN
$ export JIRA_USERNAME=blabla
$ export JIRA_PASSWORD=blabla
$ export GITHUB_API_TOKEN=blabla
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we require this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. It needs to access github data to get a people's full name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Got it.

1. Create a git tag for the release candidate.
1. Package the release binaries & sources, and upload them to a staging SVN repo.
1. Create the release docs, and upload them to a staging SVN repo.
1. Publish a snapshot to the Apache release repo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what's the meaning of this step, I don't remember I have a such step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Publish a snapshot to the Apache release repo dev/create-release/release-build.sh publish-release``

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is a bit confusing for new RM. Perhaps Apache release repo should be Apache staging Maven repo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, you mean pushing artifacts to apache staging maven repo? This seems confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we describe it before: https://github.com/apache/spark-website/pull/157/files#diff-12257e3523e2ce22fccd80687597e7cdL94

+1 that "Apache staging Maven repo" is cleaner, I'll update soon

@vanzin
Copy link

vanzin commented Nov 6, 2018

LGTM

@holdenk
Copy link
Contributor

holdenk commented Nov 7, 2018

Thank for updating the release process docs!

@cloud-fan
Copy link
Contributor Author

I'm having a problem to merge this PR. The merge script needs to know the original apache git repo, but do we have one for spark-website?

@srowen
Copy link
Member

srowen commented Nov 7, 2018

Setup should be the same as for the spark repo @cloud-fan ; here's mine for example:

$ git remote -v
apache	https://git-wip-us.apache.org/repos/asf/spark-website.git (fetch)
apache	https://git-wip-us.apache.org/repos/asf/spark-website.git (push)
apache-github	git://github.com/apache/spark-website (fetch)
apache-github	git://github.com/apache/spark-website (push)
origin	https://github.com/srowen/spark-website.git (fetch)
origin	https://github.com/srowen/spark-website.git (push)
upstream	https://github.com/apache/spark-website.git (fetch)
upstream	https://github.com/apache/spark-website.git (push)

@asfgit asfgit closed this in 44d2552 Nov 7, 2018
@cloud-fan
Copy link
Contributor Author

thanks @srowen , it works!

@felixcheung
Copy link
Member

felixcheung commented Nov 7, 2018

FYI, there is a merge script: https://github.com/apache/spark-website/blob/asf-site/merge_pr.py
EDIT: sorry, I might have misunderstood. is it the remote repo needs to be added?

@cloud-fan
Copy link
Contributor Author

yes we need to add the remote repo, which I didn't know before

@cloud-fan cloud-fan deleted the do-release branch December 25, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants