-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release a new version of DataFusion to crates.io #771
Comments
@andygrove @jorgecarleitao I couldn't find the exact commit for the 4.0.0 version we have in crates.io, Do you know where can we find it? It would be good to retroactively create the 4.0.0 tag for change log automation. I am guessing the commit is around ddaea81? @jorgecarleitao do we want to bundle the python release with the main datafusion release as well? Or do we want to release python binding in its own cadence? I think if the python binding wants to have a more frequent release cadence than the core, then it's better to have a separate release process for it. Otherwise it would be less work to bundle it with the core Datafusion release. |
I think 4.0.0 was released when datafusion was in the https://github.com/apache/arrow repo, specifically https://github.com/apache/arrow/tree/f959141ece4d660bce5f7fa545befc0116a7db79 I dug that version out of the email thread for the 4.0.0 RC3 release: |
We also need to consider what to do about Ballista release cadence and version numbers. My preference would be to release DataFusion and Ballista together. Ballista version is currently 0.5.0-SNAPSHOT and I think that releasing as 0.5.0 would be appropriate. |
That would be awesome, I would +1 on aligning to 0.5.0, also for Python Datafusion (i.e. bump from 0.2.1 to 0.5.0); I think we do have a problem with DataFusion, that is already 5.0.0 on crates.io? |
FWIW, the most recent release to crates.io for datafusion is 4.0.0: https://crates.io/crates/datafusion Given that DataFusion doesn't have the associated arrow ecosystem I think using an entirely different versioning scheme is fine for DataFusion and Ballista - I don't really have a strong opinion on versions for DataFusion (e.g I would be happy to have it be 0.5.0 or something, though I don't know how we would handle the existing versions) |
I also prefer we go with our own versioning scheme to better adhere to semantic versioning at the very least for ballista and datafusion python binding. For datafusion the ship has already sailed unfortunately. If we are to release datafusion and ballista together, how do we manage the version differences? Do we just name the same tarball with different names? Or do we actually produce two different release tarballs with project specific sources? Regardless what versioning scheme we use, the datafusion python documentation would still be needed. @jimexist perhaps you could help start building the automation for that? |
Here is what I think could work for us, release all project sources in a single signed repo source tarball, which gets uploaded to apache svn. Then from within that same source release, we publish datafusion crate, ballista crate and datafusion python wheels with different versions. For example, in the upcoming release, we could have datafusion-5.0.0, ballista-0.5.0 and datafusion python 0.3.0 (or 0.5.0 if we want to align it with ballista) published from the same source tarball Consequence of this is every time we need to release a new version of either datafusion, python binding or ballista, we would need to vote and release a new version the datafusion repo as a whole.The repo source release should be pretty light weight given we don't need to do maintenance releases in current state of the project. All we have to do is to run Changelog for each project will need to be generated separately before we propose a source release in apache svn. WDYT? |
on a side note, turns out the tree we used for 4.0.0 release (https://github.com/apache/arrow/commits/f959141ece4d660bce5f7fa545befc0116a7db79) is not in our repo. Should I push this tree to our repo and tag it with 4.0.0? Alternatively, the state of Rust code in that release maps to 31dd3cd in our master branch, so we could also just tag that commit as 4.0.0. But I think this might not align with Apache's release policy. |
@jorgecarleitao any previous work already done this area? also there's how and where docs are supposed to be hosted, e.g.
|
I think that would work nicely and I think it makes a lot of sense.
I would recommend not doing this (because it will effectively bring along the (large) history of the arrow repo with it, effectively making the arrow-datafusion repo several times larger).
I think tagging 31dd3cd as 4.0.0 is fine (and I did something similar in arrow-rs) -- the official apache release policy, at least as I understand and discussed on the arrow-dev mailing lists, was centered around the tarballs as the artifacts. Since the 4.0.0 release / announcement was built from https://github.com/apache/arrow/commits/f959141ece4d660bce5f7fa545befc0116a7db79 if anyone wants to know exactly what was in the release they can use that reference. tagging 31dd3cd in this repo to compute the changelog seems like it would be fine |
@jimexist there were some previous discussion on how to host a website for datafusion on the dev list: https://lists.apache.org/thread.html/r0ed76cc60cdf651e8cf5c82a21cc64114c1f6d174dc5487434bd32ef%40%3Cdev.arrow.apache.org%3E. Read the docs is certainly the route with the least amount of work, but I am not 100% sure if it's something allowed for apache projects. The dev mailing list recommended hosting our docs as a sub path under the main arrow website. This incurs more work since this is not a turnkey solution. But it has the added benefit of us leveraging the arrow brand for marketing Datafusion and potentially improve SEO. We might be able to reuse the same automation that's used to generate https://arrow.apache.org/docs/python/api.html for datafusion too. |
Quick update on my end, I have pushed the |
Ballista is now at a point where it supports the TPC-H queries with
reasonable performance and I am now working on improving the developer
documentation and the user guide. I am keen on getting an official release
soon so that we can reserve the crate names. Let me know how I can help
with the release work. I generally only have time at weekends though.
…On Wed, Jul 28, 2021 at 10:43 AM QP Hou ***@***.***> wrote:
Quick update on my end, I have pushed the 4.0.0 tag with a reference to
31dd3cd
<31dd3cd>
to help with change log generation. Next up, I will look into subproject
changelog automation using PR labels while waiting for more feedbacks on
the release process proposed in #771 (comment)
<#771 (comment)>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#771 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHEBRAHVSSJQFZ3IGBLKMDT2A6TFANCNFSM5A4TC45A>
.
|
I am on a similar boat at the moment. I have went back and tagged most of the issues with proper labels for changelog automation. I should be able to get the automation completed tomorrow and send a PR for you all to review both the code and changelog. @andygrove I think where you will be able to help the most would be writing the announcement blog post for the release ;) But I don't think this should be a blocker for the crates.io release. In fact, if we are concerned about reserving the crate names, we could grab the name now with a placeholder release similar to https://crates.io/crates/roapi/0.1.0. |
I agree that writing the blog post concurrently / after the release is totally fine. I am also happy to help draft the release blog |
I went ahead and reserved the names "ballista-core", "ballista-scheduler",
"ballista-executor" on crates.io. I already had "ballista" from prior to
the donation.
I'll continue working on documentation for Ballista and we can re-use
and/or link to some of that in the blog post.
…On Sat, Jul 31, 2021 at 5:23 AM Andrew Lamb ***@***.***> wrote:
I agree that writing the blog post concurrently / after the release is
totally fine.
I am also happy to help draft the release blog
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#771 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHEBRBVOOO2LPX7DYQBDH3T2PTNDANCNFSM5A4TC45A>
.
|
I have prepared a preview in #801. After the PR gets merged, the next step is to create a signed tarball for voting. Here are the generated changelog for each subproject:
My proposed release process requires adding two new sets of tags: I am going to send the proposed release setup to the dev list to gather more feedbacks today. |
I think I have all the apache release related automations completed for #801. @jimexist let me know if you need any help with python documentation automation. I am aiming push a final update to #801 this weekend and mark it as ready for review/merge if everything goes well this week. We can also release the python doc as part of a fast follow release after the 5.0.0 release if we don't want to rush it. My current goal is to make the datafusion release as light weight as we could so we can release more often. |
👍 -- @houqp let me know if there is anything I can do to help |
I will update #801 and mark it as ready for review tomorrow. @andygrove I am assuming we want to wait for #831 for the release? |
Yes, I will have the documentation PR ready by for review in the next few
hours. It would be good to get this in before we release.
…On Sat, Aug 7, 2021, 1:37 AM QP Hou ***@***.***> wrote:
I will update #801 <#801>
and mark it as ready for review tomorrow. @andygrove
<https://github.com/andygrove> I am assuming we want to wait for #831
<#831> for the release?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#771 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHEBREOQGJJPVN5KDS62RLT3TPDXANCNFSM5A4TC45A>
.
|
@alamb @andygrove @Dandandan @jorgecarleitao @nevi-me @jimexist #801 is now ready for review. Once it's merged, we will be able to push tags, release tarball and send a rc voting email to the dev list. |
Filed #837 to track the python doc release as a quick follow up so we don't have to block the current release. |
OK, release has been approved on the dev list. I have pushed the final release tags on Github. However, I don't have write access to arrow's release directory in SVN, crates.io nor PyPI. So I will need anyone who has access to these resources help me finish up the release. The steps are documented at https://github.com/houqp/arrow-datafusion/blob/qp_release_doc/dev/release/README.md#finalize-the-release. The remaining steps are:
|
I will run the script to upload to SVN... |
(@houqp the credentials I use for svn is my apache username/password). I am not sure what, if any, additional permissions are needed to upload to the release svn directory |
The release files can be found here: https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-5.0.0/ I do not have permissions on the datafusion or ballista projects on crates.io. I believe @andygrove will have to grant one or both of us those permissions Thanks again for all this work @houqp 🚀 |
I have invited @alamb to become an owner on all of the crates (I think this is currently a PMC role) and I have published the crates. @houqp Could you call the vote on the mailing list? https://crates.io/crates/datafusion/ This whole process went very smoothly. Thank you @houqp and @alamb. |
@alamb I am double checking with asf infra on the permission, but it looks like only PMC member has write access to the release folder. There is one thing that I noticed is not right, the release tarball is signed with my signing key, I have added it to the end of https://dist.apache.org/repos/dist/dev/arrow/KEYS, but I also don't have permission to add it to
@andygrove do we still need to call a vote for the crates.io release? I thought the vote on the source tarball should have alredy covered the creates.io source release? |
No need to vote on the crates publishing. The vote was just for the source
release.
…On Sat, Aug 14, 2021, 11:20 AM QP Hou ***@***.***> wrote:
@alamb <https://github.com/alamb> I am double checking with asf infra on
the permission, but it looks like only PMC member has write access to the
release folder. There is one thing that I noticed is not right, the release
tarball is signed with my signing key, I have added it to the end of
https://dist.apache.org/repos/dist/dev/arrow/KEYS, but I also don't have
permission to add it to release/arrow/KEYS. I believe my key should be
added to the release KEYS file too so people can use it to verify the
signature.
@houqp <https://github.com/houqp> Could you call the vote on the mailing
list?
@andygrove <https://github.com/andygrove> do we still need to call a vote
for the crates.io release? I thought the vote on the source tarball
should have alredy covered the creates.io source release?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#771 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHEBRDEQSKIOY3SYY5WNOTT42QVLANCNFSM5A4TC45A>
.
|
OK, I have called the vote on the dev list, will add this step to the release document as well. |
Filed #887 to track Python PyPI release as a follow up. I believe the only remaining item is to copy my code signing key from |
@houqp I have added your key here: https://dist.apache.org/repos/dist/release/arrow/KEYS |
Thank you @alamb we are all good to close this issue then :) |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The current version of datafusion on crates.io is 4.0.0 (which among other things doesn't work with the newly released arrow 5.0.0)
Describe the solution you'd like
It would be great if we released a DataFusion 5.0.0 (or some other number if we want to diverge from arrow) -- doing so would likely involve porting the arrow-rs release scripts, from https://github.com/apache/arrow-rs/tree/master/dev/release and then sending it to the dev mailing list for a formal vote
Starting this ticket to gather some feedback
The text was updated successfully, but these errors were encountered: