Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manylinux2010 #182

Closed
wants to merge 20 commits into from
Closed

Manylinux2010 #182

wants to merge 20 commits into from

Conversation

dolang
Copy link

@dolang dolang commented Apr 16, 2018

Now that there's some activity again here, I figure it's easier to expose the changes I made as a pull request so you can have a look at the diffs.

This is from my manylinux2010 branch which is based on markrwilliams:manylinx2 merged into pypa:master.

I tried to keep this branch close to upstream and just merged the most recent commit to master (4a20e4b) with minor adjustments.

I don't expect this to be merged by itself, but hope it saves you guys some time. Notable:

  • Removed gpg from PYTHON_COMPILE_DEPS because it's already present on CentOS 6 and its erasure later in the script breaks the build.
  • Removed ncurses-devel as required by the PEP.
  • Added perl-devel, seems to be required for the build now.
  • Added a workaround for libcurl.so incompatibility (see commit notes & comments).
  • Removed the get-pip.py workaround. That has been fixed (see here) and my build script works again.
  • Moved libffi-devel from PYTHON_COMPILE_DEPS to the "Development tools and libraries" section.
  • Changed the x86_64 Docker tag from :6.9 to :6. This mirrors a manylinux1 change intended to reduce the overall image size (see Reduce image size #206 and also :6 Docker branch / :6.9 Docker branch).
  • Moved from CentOS 6.9 to 6.10
  • Careful: I also changed occurrences of manylinux1 to either manylinux or manylinux2010. This might be undesired.

Side note: not a fan of the /dev/null out/error stream redirections in the scripts. They just make it harder to debug when something fails. And there definitely are moving parts, as the get-pip.py problem has shown.

…ster

- The dockerfiles build successfully.
- Compared to current master, there's a new intermediate image for
  glibc.  As the x86_64 build depends on it, this must have a predefined
  name.  In the PR the "naming prefix" is 'markrwilliams/manylinux2',
  therefore it's assumed for all images.
- There's a problem with libcurl which has been (temporarily) resolved
  with script which redirects yum.  See docker/build_scripts/build.sh for
  details.

The following script is proven to work:

  pushd docker/glibc/
  docker build -t markrwilliams/manylinux2:centos-6.9-no-vsyscall
  popd
  docker/build_scripts/prefetch.sh curl openssl
  docker build -t markrwilliams/manylinux2:x86_64 \
    -f docker/Dockerfile-x86_64 docker/
  docker build -t markrwilliams/manylinux2:i686 \
    -f docker/Dockerfile-i686 docker/
- docker/build_scripts/build.sh:
  - removed libncurses.devel; not part of PEP 571
  - removed gpg as dependency, because yum already requires it
  - removed some /dev/null redirections, because they hinder debugging

- docker/build_scripts/manylinux1-check.py
  - renamed to manylinux-check.py
  - changed names/comments to new version
  - the check now targets glibc 2.12 instead of 2.5

- .travis.yml
  - only changed the image name
  - TODO: x86_64 still needs a script for the no-vsyscall image dependency

- docker/deploy.sh
  - changed image name

- note: .rst files have not been touched
- The freshly released version of pip breaks build_utils.sh which tries
  to get the latest and greatest.  Added a condition to instead use the
  latest version below v10.
- Incorporated the lastest changes from the official repository.
  - That cleanup removed two packages from 'yum erase ...' which looks
    like an oversight. Readded.
- Removed the hack which fixed the broken build when the new pip was
  released.  This has been fixed upstream, see here:
  pypa/get-pip#19
@njsmith
Copy link
Member

njsmith commented Apr 16, 2018

Unfortunately, some amount of > /dev/null redirections are mandatory because Travis unconditionally kills any job that prints too much output.

@dolang
Copy link
Author

dolang commented Apr 16, 2018

Good thing you mentioned that (I didn't actually know). While I've removed those from my master, I actually tried to not mess with this manylinx2010 branch.

I've just noticed that I've removed two of those in this branch and you're making a very good point for keeping them in. So I'll add them back.

An alternative would be to redirect to e.g. >> /var/log/build.log and remove that file again during the cleanup step. In case the build breaks (locally), one could at least docker commit to inspect the log instead of rerunning it manually. But it's a minor issue anyway.

Just noticed as well that the diffs are not that great because I merged the latest commits into this preexisting branch. You'd have to rebase/merge locally to get a better result, I guess. Sorry about that.

@dolang
Copy link
Author

dolang commented May 2, 2018

Note: The travis CI build does actually run now for this pull request, but despite what it says, it didn't complete successfully.

The newly added build.sh needs a set -e -x or something like that.

I run a local build before each merge commit, however. So it should be fine.

@trishankatdatadog
Copy link
Member

Sorry about the new build.sh, @dolang, that was me. Want me to add those shell options?

@dolang
Copy link
Author

dolang commented May 2, 2018

Myself, I don't need it because I don't (yet) use travis for anything. I have a small build script in my master branch. But I guess it would be in your interest to add those options (if travis allows them)? In any case, I can't verify that the code for travis is correct.

I'm not really in the loop here, I'm just keeping this pull request up-to-date as a reference/help for you guys once it's time to make the switch to manylinux2010.

@trishankatdatadog
Copy link
Member

I think I see the problem now. Hopefully fixed in #190...

@dolang
Copy link
Author

dolang commented May 3, 2018

x86_64 still fails, but that's not a surprise because it depends on that third Docker image which is not required for manylinux1. I might try to fix that later.

However, i686 seems to have completed a full build, so #190 definitely helped.

@trishankatdatadog
Copy link
Member

Glad to hear, let me know if I can help with anything else!

- Add platform-dependent clause
- Assign the same prefix to the the no-vsyscall image (quay.io/pypa/...)
- The dependent docker/Dockerfile-x86_64 needs this prefix in FROM ...
@dolang
Copy link
Author

dolang commented May 3, 2018

Took a stab at it, but I'm not sure if the no-vsyscall Docker image should eventually get the $TRAVIS_COMMIT tag, too. So I left it out because that's easier.

Edit: Might actually be running fine, but Travis can't deal with the glibc build. Needs less output and more travis_wait time.

dolang added 2 commits May 4, 2018 01:04
- Increase in `travis_wait` time
- Shunting build output off into a log file which can be inspected
  on failure.
@lelit lelit mentioned this pull request May 19, 2018
is merge is necessary,
- Similar to gpg the libffi-devel package can't be removed after the
  build, but it looks like this is because it was improperly packaged.
- Moved libffi to the "Development tools and libraries" section.
@dolang
Copy link
Author

dolang commented Jun 16, 2018

I had to move libffi-devel from the PYTHON_COMPILE_DEPS because it can't be removed after the build, breaking the script:

install-info: No such file or directory for /usr/share/info/libffi.info.gz
error: %preun(libffi-devel-3.0.5-3.2.el6.x86_64) scriptlet failed, exit status 1
Error in PREUN scriptlet in rpm package libffi-devel
libffi-devel-3.0.5-3.2.el6.x86_64 was supposed to be removed but is not!
  Verifying  : libffi-devel-3.0.5-3.2.el6.x86_64

Failed:
  libffi-devel.x86_64 0:3.0.5-3.2.el6

It's only 21 Kb in size, so keeping it in the "Development tools and libraries" section is not a big problem anyway.

Like with CentOS 5 there's a CentOS 6 Docker image which is more up to
date (tag :6) than the latest release (tag :6.9).
See: pypa#206
@dolang
Copy link
Author

dolang commented Jul 4, 2018

Well, that's odd. It did build on my machine before committing, though. From the Travis log:

$ PLATFORM=$PLATFORM TRAVIS_COMMIT=$TRAVIS_COMMIT travis_wait 60 ./build.sh


Still running (49 of 60): ./build.sh

The job exceeded the maximum time limit for jobs, and has been terminated.

Ran for 49 min 29 sec

@dolang
Copy link
Author

dolang commented Jul 4, 2018

On another note: Looks like CentOS 6.10 is available now, and I should look through the code again.

I'll have a look at that later.

@mayeut mayeut mentioned this pull request Jul 7, 2018
@dolang
Copy link
Author

dolang commented Jul 11, 2018

Somehow the Travis build stops at 50 minutes although travis_wait is set to 60. Locally, the images build fine.

I've had a look around, maybe I should try the workaround linked here from this commit?

Previously, vault.centos.org was missing some repository information,
which prevented moving to 6.10.  This is now available on CentOS 6.10
and it includes a glibc v1.212 (up from 1.209).  All changes made are
only to bump this version number.
@dolang
Copy link
Author

dolang commented Jul 28, 2018

I've been able to move to CentOS 6.10 now, previously some repository information was missing (namely the repodata subfolder in http://vault.centos.org/6.10/os/Source/).

There's a new build of glibc in there, so I had to update the version from 1.209 to 1.212. Now here's a question regarding the custom glibc version:

I think I've run into a problem at some point because the new version number that is assigned in the no-vsyscall simply appends a .1 (dot one), i.e. now it's 1.212.1, but I don't quite remember because it's already been a few months. To someone more familiar with RPMs: Wouldn't it be better to append a -1 (dash one) denoting a new build instead of a new version?

Edit: Actually, I'm not quite sure if that's the right way to denote the same version but a different release. I've had a look at this Serverfault question and the linked Fedora Guidelines. The latter seems to indicate one would have to append a .1 at the end. Or maybe I'm not understanding this right.

@njsmith
Copy link
Member

njsmith commented Jul 29, 2018

I can explain the timeout stuff anyway... Travis puts a hard limit of 50 minutes on all (open source) build jobs: https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts

Travis also kills jobs that spend 10 minutes without producing output; the travis_wait script is a hack to avoid hitting this 10 minute timeout, by producing a continuous stream of meaningless output. But it doesn't help with the 50 minute limit.

@jcfr
Copy link
Contributor

jcfr commented Jul 29, 2018

@njsmith Would you consider switching to CircleCI ? I would be happy to help.

@njsmith
Copy link
Member

njsmith commented Jul 29, 2018

@jcfr The time I have to spend on manylinux-related things has gone down a bunch in the last few months, and I was kind of the only one doing things. So, honestly as far as I'm concerned, anyone who wants to step up and do the work is pretty much welcome to do whatever works for them.

@dralley
Copy link

dralley commented Sep 5, 2018

Bump: Is this work going to proceed any time soon?

@trishankatdatadog
Copy link
Member

Sorry, I haven't had a time to look at this either. What do @dolang and others think?

@mayeut
Copy link
Member

mayeut commented Sep 6, 2018

If I'm not mistaken, auditwheel still needs to get updated. c.f. #179

@trishankatdatadog
Copy link
Member

@mayeut Does that require another PR here?

@mayeut
Copy link
Member

mayeut commented Sep 6, 2018

Once audit wheel gets released - I have no idea when, I commented on a PR there some time ago -, the pyup-bot will open a PR that needs to get merged in master (if CI is ok of course). Then I guess @dolang will have to merge upstream changes once more in this PR.
After that, I think it should be ok to merge if reviewers have nothing more to say. I guess manylinux1 (current master) will still have to be maintained for some time in a maintenance branch.

@trishankatdatadog
Copy link
Member

Ok, sounds good. I'll try to read up on the PEP and this PR in the meantime.

cp $MY_DIR/epel-release-5-4.noarch.rpm .
check_sha256sum epel-release-5-4.noarch.rpm $EPEL_RPM_HASH
# https://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
cp $MY_DIR/epel-release-6-8.noarch.rpm .
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be concerned about this, but note that this doesnot correspond to centos 6.10. I'm not sure what the canonical location for the extended package for enterpriselinux (EPEL) rpm is, myself.

*Summary*: Because of
https://mail.python.org/pipermail/wheel-builders/2016-December/000239.html,
this a CentOS 6.10 Docker image that rebuilds ``glibc`` without
*vsyscall* is necessary to reliably run ``manylinux2`` on 64-bit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manylinux2 -> manylinux2010

@ehashman
Copy link
Member

ehashman commented Nov 17, 2018

There is a bit of a bootstrapping problem in which auditwheel is currently testing against a non-official manylinux2010 image because this one hasn't been released 😄 I'll have an auditwheel release candidate up by the end of the weekend.

Edit: Please test this PR by bumping the auditwheel version to 2.0rc1. I don't want to unleash 2.0 on the world unless we're sure it's working okay.

@takluyver
Copy link
Member

Is someone clear what the next steps are for this PR and/or #252?

auditwheel 2.0 is out now, and as @mayeut said, there is a PR (#262) to update to it in this repository. Does that patch need to be applied to this PR?

What are the relative merits of this and #252? @henriquegemignani mentions on that PR that it uses 'devtoolset 7', and 'another approach to the curl issue', but it is (or was?) less ready for merge than this one.

I'm keen to see manylinux2010 in use, and I think the images are the last crucial part of the toolchain. But I lack the low-level knowledge to evaluate the changes in either PR.

@mayeut
Copy link
Member

mayeut commented Feb 16, 2019

Well, I'm not sure anyone's clear one the next steps. I'm also not sure this PR is the place to discuss it either so I'll answer in #179

@mayeut
Copy link
Member

mayeut commented Apr 12, 2019

Superseded by #279 that's been merged-in.

@mayeut mayeut closed this Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants