- 
                Notifications
    You must be signed in to change notification settings 
- Fork 9.1k
HADOOP-19636. [JDK17] Remove CentOS 7 Support and Clean Up Dockerfile. #7822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| (!) A patch to the testing environment has been detected. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| @ayushtkn @GauthamBanasandra @Hexiaoqiao Could you please help review this PR? Thank you very much! cc: @pan3793 | 
| // This stage serves as a means of cross platform validation, which is | ||
| // really needed to ensure that any C++ related/platform change doesn't | ||
| // break the Hadoop build on Centos 7. | ||
| stage ('precommit-run Centos 7') { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's leftover centos 7 stuff at line 86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| # Dockerfile for installing the necessary dependencies for building Hadoop. | ||
| # See BUILDING.txt. | ||
|  | ||
| FROM centos:8 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove centos8, instead, we should migrate it Rocky Linux 8 (or other RHEL-like OS) in place, then 9 or 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a personal perspective, I don't agree with your suggestion. I believe we should completely remove operating systems that have reached their End of Life (EOL). If we need to support CentOS 9 or Debian 12 in the future, it should be done by submitting a new PR for a thorough evaluation. Rather than maintaining multiple Dockerfiles, I prefer a more lightweight approach, such as providing support through documentation. As the number of supported operating systems increases, if we have to maintain Dockerfiles for each one, we could end up managing dozens, which is neither cost-effective nor sustainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should completely remove operating systems that have reached their End of Life (EOL). If we need to support CentOS 9 or Debian 12 in the future, it should be done by submitting a new PR for a thorough evaluation.
I don't see much benefit in your proposal, I suppose upgrading in place is straightforward, and can leave clear diff in the commit history to guide users to understand what they should change for planning Hadoop cluster OS upgrading.
Rather than maintaining multiple Dockerfiles, I prefer a more lightweight approach, such as providing support through documentation.
The documentation can easily become outdated (you can try Building on macOS (without Docker) in BUILDING.txt). As I replied here, I think the Dockerfile itself is the best documentation for setup the building env.
https://lists.apache.org/thread/2ypqcrnsth3jk21rpjvjv53tntz21ht8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choice of operating system should be made by the user, and therefore, the resolution of compilation issues should also be handled by the user.
Take CentOS 7 as an example, which has multiple versions (such as 7.2, 7.3, 7.9, etc.). Different versions may have configuration or dependency differences (e.g., glibc, gcc versions), which can lead to compilation issues, such as with protobuf or native package compilation. For these issues, we should not add extra workarounds, as that would make the project redundant.
If we were to upgrade to CentOS 9, we would change the Dockerfile name from Dockerfile_centos_8 to Dockerfile_centos_9. Users comparing the diff would see that Dockerfile_centos_8 has been deleted and replaced with Dockerfile_centos_9, which contains entirely new content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we can not enumerate all Linux distributions and versions. I believe most enterprises use Debian/RHEL family of Linux distributions to run Hadoop. Given the limitation of developer resources in the Hadoop community, how about keeping only 2 OS Dockerfiles and CI pipelines - the latest(or sub-latest) version of Ubuntu(the default env for building, testing, releasing) and Rocky Linux(only verify the compilation)? They will serve as reference for users who want to set up a building environment based on their preferred Linux distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that we should remove some dependencies which are EOL, just like some other module. Back to here , CentOS 8 has reached its EOL and the packages re no longer available on mirror.centos.org site.(https://www.centos.org/centos-linux-eol/), So +1 to Shilun's comments from my side. cc @pan3793 What do you think about. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hexiaoqiao If you agree to retain at least one RHEL-family OS Dockerfile for Hadoop building, I suggest keeping CentOS 8, because CentOS 8 works well(the mirror.centos.org site was replaced by vault.centos.org, see dev-support/docker/pkg-resolver/set-vault-as-baseurl-centos.sh) for the Hadoop project build as of today, I plan to migrate it to Rocky Linux 8 soon.
https://endoflife.date/rocky-linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of a RHEL variant. There's also the option of an amazon linux container image, which uses yum.
| # Dockerfile for installing the necessary dependencies for building Hadoop. | ||
| # See BUILDING.txt. | ||
|  | ||
| FROM debian:10 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, we should upgrade it to debian 12 or 13 in place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slfan1989 for your works. +1 from my side.
| // This stage serves as a means of cross platform validation, which is | ||
| // really needed to ensure that any C++ related/platform change doesn't | ||
| // break the Hadoop build on Centos 7. | ||
| stage ('precommit-run Centos 7') { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| # Dockerfile for installing the necessary dependencies for building Hadoop. | ||
| # See BUILDING.txt. | ||
|  | ||
| FROM centos:8 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that we should remove some dependencies which are EOL, just like some other module. Back to here , CentOS 8 has reached its EOL and the packages re no longer available on mirror.centos.org site.(https://www.centos.org/centos-linux-eol/), So +1 to Shilun's comments from my side. cc @pan3793 What do you think about. Thanks.
| // This stage serves as a means of cross platform validation, which is | ||
| // really needed to ensure that any C++ related/platform change doesn't | ||
| // break the Hadoop build on Centos 8. | ||
| stage ('precommit-run Centos 8') { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing CentOS 8, I would suggest replacing it with another supported RHEL8 clone, like Rocky Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This was already suggested earlier...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion! We will continue to retain CentOS 8 and plan to upgrade to Rocky Linux 8 in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should replace CentOs 8 instead of dropping it outright.
| 
 Thank you for your feedback. Feel free to continue sharing your thoughts in this email thread. So far, I’ve received comments from @ayushtkn , @Hexiaoqiao , @cnauroth, @pan3793. We are still in the discussion phase, and a final decision will be made based on the collective input. https://lists.apache.org/thread/2ypqcrnsth3jk21rpjvjv53tntz21ht8 | 
| @GauthamBanasandra Thank you, and I look forward to hearing your thoughts on this issue. | 
| Can you please forward the last email to stoty@apache.org @slfan1989 so that I can reply? | 
| 
 I’ve cc’d you on the email—please have a look when it’s convenient for you. | 
| @Hexiaoqiao @steveloughran @stoty @pan3793 Thank you all for your participation and valuable feedback! After careful consideration, I have decided to adopt your suggestions in the upcoming JDK 17 upgrade. Regarding the removal of support for EOL operating systems, I plan to discontinue the Docker and Jenkins build commands on CentOS 7, while continuing to support image builds on CentOS 8 and Debian 10. Moving forward, we will also focus on upgrading to CentOS 9 and higher versions of Debian. | 
| (!) A patch to the testing environment has been detected. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Would you mind tuning the PR title to make it more precise?
| (!) A patch to the testing environment has been detected. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| (!) A patch to the testing environment has been detected. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| (!) A patch to the testing environment has been detected. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| This looks good to me, but this no longer matches the JIRA/commit description, as only Centos7 is removed now, but ubuntu 10 and Centos 8 is kept. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| (!) A patch to the testing environment has been detected. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| (!) A patch to the testing environment has been detected. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
Fix Json Error. Co-authored-by: Cheng Pan <pan3793@gmail.com>
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| (!) A patch to the testing environment has been detected. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| @steveloughran @Hexiaoqiao Could you please take another look at this PR? I've updated the description. If everything looks good, I’ll go ahead and merge it. The co-authors are Stoty and Pan. | 
| LGTM. I can submit PRs to 1) upgrade Debian from 10 to 11, and 2) migrate CentOS 8 to Rocky Linux 8 after this gets in. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me. Thank you @slfan1989 and all reviewers.
Description of PR
JIRA: HADOOP-19636. [JDK17] Remove CentOS 7 Support and Clean Up Dockerfile.
In the Apache Hadoop project, we have historically supported multiple Linux distributions, including
CentOS 7,CentOS 8, andDebian 10. However,CentOS 7has reached its End-of-Life (EOL) status, and it no longer receives official support or security updates. To streamline our Continuous Integration (CI) infrastructure and reduce the maintenance burden, this PR focuses on the following changes:Remove support for
CentOS 7, including the associated Dockerfiles and Jenkins pipeline configurations, as the platform is no longer supported.Further changes regarding
CentOS 8, which will be migrated toRocky Linux 8, and the upgrade ofDebian 10to a higher version will be addressed in subsequent PRs.This PR is aimed at removing
CentOS 7support to improve efficiency and minimize unnecessary maintenance overhead.How was this patch tested?
junit test.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?