-
-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuilding for longer path length (conda-build 2.0.0) #171
Comments
cc @conda-forge/core @stuarteberg ( just in case 😉 ) |
For a given conda installation, this is the command I'm using to decide if the package will need to be rebuilt after the conda-build 2.0 release: fgrep -l ' binary ' $(conda info --root)/pkgs/*/info/has_prefix I'm not sure how to do that for all packages in conda-forge generally. |
Also asked the Anaconda team for some way to check this easily. See issue ( https://github.com/Anaconda-Platform/support/issues/44 ). |
tl;dr - do we need to rebuild everything that has the |
Yep - or rather yep, for anything that is a library that something else links to. |
Is it worth extending conda so that it can handle both lengths of prefix placeholder? |
so, try to use long first, but if that fails, fall back to short? That's probably a good idea. It might make the transition less jarring. |
As more info for this planning, @bollwyvl made an addition released However, we still have the challenging issue of all the old packages that were not uploaded with this info. This is described in issue ( https://github.com/Anaconda-Platform/support/issues/57 ). It would be good if we can either get that info added to those package. Alternatively, if that is not possible, a list of packages with that info for the |
So we still don't know all the things we need to rebuild because of the path length increase. I really don't understand how we can proceed to |
After reading these comment this is how I understand the issue, please correct me if I am incorrect in my understanding. From my reading of this issue, we need to determine which packages in For packages which were uploaded with For packages uploaded with older version of |
Yep, that's all correct.
This is what I'm unsure about. There was a fair bit of discussion about this being done server-side (so presumably by Continuum). Though it is unclear to me whether that is still the case or not as there has been little response of late. |
Relevant info from gitter on a script to download and check by @jjhelmus from start to end with some results and comments. The script: https://gist.github.com/jjhelmus/869d6827ac8e0275437e7643989974e4 |
Ran the aforementioned script by @jjhelmus with modifications to download all files no matter what and then search for Have included the log, the list of files with a binary prefix, the list of packages downloaded with md5 checksums, the modified script, and the list of undetermined files in gist ( https://gist.github.com/jakirkham/621cd3a03098205f5eba83533df932fe ). Found 30 packages that are known issues and 26 that are undetermined. At the time of this writing there are 1149 packages according to anaconda.org on conda-forge. We downloaded 1085 packages. Given the 26 packages we were unable to download at all due to a missing entry, this leaves us with an additional 38 that are unaccounted for. |
What if we use similar logic to conda-build in order to search the package binaries for the shorter prefix string? |
Maybe, but maybe that runs us into the same issues we have seen with
Edit: Added xrefs for upstream issues about these 2 problems. |
No we should definitely do what you are doing - just for the files that are missing |
As you are the second person asking me this question, I must have done a terrible job explaining this. Sorry about that. 😞 AFAIK all packages have the |
This is caused by a bug in the script used to find the latest versions of the packages which filters out all non-latest versions. If the package uses a non-standard version scheme, post releases or other oddities all instances of the package can be filtered out. If you change the def find_latest_versions(index, package_name):
""" Return the latest version and packages from a conda channel index. """
valid = [v for v in index.values() if v['name'] == package_name]
versions = [parse_version(v['version']) for v in valid]
latest_ver = str(max(versions))
entries = [v for v in valid if v['version'] == latest_ver]
if len(entries) == 0:
# fall back to sorting versions by string if all entries were removed
versions = [v['version'] for v in valid]
latest_ver = sorted(versions)[-1]
entries = [v for v in valid if v['version'] == latest_ver]
return latest_ver, entries I'm still only find 1149 packages where-as Anaconda.org shows 1173 (24 missing) which I am investigating. |
This is also a bug in the script. The list of Anaconda.org packages is only pulled for the platform on which the script is running on (linux-64, osx-64, etc) and therefore fails to find some of the files. I will post a modified version of the script shortly which checks all packages on Anaconda.org. |
Ah, ok, so these are presumably non-osx packages in my case. Thanks for clarifying. Will close that bug report. Could you also please incorporate the changes from my script when you update? |
The following script can be used to find packages with a binary prefix in the It is not perfect; it only search what it thinks to be the latest version of the package, and then only a random (based on the package's MD5) build/platform of these packages. None-the-less it finds 29 packages which have a binary prefix which will need to be rebuild with conda-build 2.0. It misses the curl package because the windows package it downloads does not need a binary prefix: ! /usr/bin/env python3
""" Find conda packages which use a binary prefix. """
import argparse
import bz2
import json
import os
import tarfile
import urllib
try:
from packaging.version import parse as parse_version
except ImportError:
from pip._vendor.packaging.version import parse as parse_version
def get_channel_index(channel):
""" Return the channel index for all platforms. """
# find all packages in the channel one platform at a time
index = {}
url_template = 'https://conda.anaconda.org/%s/%s/repodata.json.bz2'
for platform in ['linux-64', 'osx-64', 'win-32', 'win-64', 'linux-32']:
url = url_template % (channel, platform)
response = urllib.request.urlopen(url)
decomp = bz2.decompress(response.read())
json_response = json.loads(decomp.decode('utf-8'))
index.update(json_response['packages'])
# add a download url to all packages in the index
channel_url = 'https://conda.anaconda.org/%s' % channel
for fn, info in index.items():
subdir = info['subdir']
info['url'] = channel_url + '/' + subdir + '/' + fn
return index
def find_latest_versions(index, package_name):
""" Return the latest version and packages from a conda channel index. """
valid = [v for v in index.values() if v['name'] == package_name]
versions = [parse_version(v['version']) for v in valid]
latest_ver = str(max(versions))
entries = [v for v in valid if v['version'] == latest_ver]
if len(entries) == 0:
# fall back to sorting versions by string if all entries were removed
versions = [v['version'] for v in valid]
latest_ver = sorted(versions)[-1]
entries = [v for v in valid if v['version'] == latest_ver]
return latest_ver, entries
def parse_arguments():
""" Parse command line arguments. """
parser = argparse.ArgumentParser(
description="Find conda packages which use a prefix")
parser.add_argument(
'packages', nargs='*',
help=('Name of packages to check, leave blank to check all packages '
'on the channel'))
parser.add_argument(
'--skip', '-s', action='store', help=(
'file containing list of packages to skip when checking for '
'prefixes'))
parser.add_argument(
'--verb', '-v', action='store_true', help='verbose output')
parser.add_argument(
'--channel', '-c', action='store', default='conda-forge',
help='Conda channel to check. Default is conda-forge')
parser.add_argument(
'--json', action='store', help='Save outdated packages to json file.')
parser.add_argument(
'--directory', '-d', action='store',
default=os.path.join(os.getcwd(), 'pkg_cache'),
help='where to store packages')
return parser.parse_args()
def find_prefix_packages(index, package_names, verbose, cache_dir):
""" Return a list of packages which use a prefix. """
pkgs_with_bin_prefix = []
pkgs_with_no_bin_prefix = []
for package_name in sorted(package_names):
_, entries = find_latest_versions(index, package_name)
if not entries:
print(package_name + " : Missing any entries. Skipping...")
continue
# sort entired by md5 so we try the same package each time
entries = sorted(entries, key=lambda k: k['md5'])
url = entries[0]['url']
filename = os.path.join(cache_dir, url.split('/')[-1])
# Download if not in cache
if not os.path.exists(filename):
print("Downloading:", filename)
response = urllib.request.urlopen(url)
with open(filename, 'wb') as f:
f.write(response.read())
# determine if package uses a binary prefix
tf = tarfile.open(filename)
try:
uses_prefix = b' binary ' in tf.extractfile(
tf.getmember('info/has_prefix')).read()
except KeyError:
uses_prefix = False
if uses_prefix:
print(package_name, "uses a binary prefix")
pkgs_with_bin_prefix.append(package_name)
elif uses_prefix is False:
pkgs_with_no_bin_prefix.append(package_name)
if verbose:
print(package_name, "does NOT use a binary prefix")
print("Uses a binary prefix:", len(pkgs_with_bin_prefix))
print("Does NOT use a binary prefix:", len(pkgs_with_no_bin_prefix))
print("Total:", len(pkgs_with_bin_prefix) + len(pkgs_with_no_bin_prefix))
return pkgs_with_bin_prefix
def main():
""" main function """
args = parse_arguments()
# create somewhere to store downloaded packages.
if not os.path.exists(args.directory):
os.makedirs(args.directory)
# determine package names to check
index = get_channel_index(args.channel)
package_names = set(args.packages)
if len(package_names) == 0: # no package names given on command line
package_names = {v['name'] for k, v in index.items()}
# remove skipped packages
if args.skip is not None:
with open(args.skip) as f:
pkgs_to_skip = [line.strip() for line in f]
package_names = [p for p in package_names if p not in pkgs_to_skip]
# find packages which use a binary prefix
pkgs_with_bin_prefix = find_prefix_packages(
index, package_names, args.verb, args.directory)
# save pkgs_with_bin_prefix to json formatted file is specified
if args.json is not None:
with open(args.json, 'w') as f:
json.dump(pkgs_with_bin_prefix, f)
if __name__ == "__main__":
main() |
Results of running the script:
I think that list along with curl is all that we should focus on rebuilding right away when we move to conda build 2.0. If other packages are found to have a short prefix we can rebuild those as needed. |
Ok, that's reassuring that we are now seeing the right number of packages. I know that the prefix length was increased on Mac/Linux. However, I'm less clear on what (if any) change was done for Windows. @msarahan, could you please provide us some guidance on Windows?
Should we be doing this randomly or should we try to get one from each platform (if possible)? Assuming of course we need to be rebuilding on Windows too and that Mac/Linux is any different on this front. |
Updated my original gist so that the script now downloads and checks a package from each platform (osx, linux, and win). This might still miss a few corner cases but I think it is good enough. The list of packages which use a binary prefix that this script finds are:
|
Thanks @jjhelmus. Completely agree that this is good enough. 👍 |
At this point, we are waiting for a |
Thanks for working on this! Really appreciated! |
So here is a case ( conda-forge/cmake-feedstock#14 ) where we are encountering a build failure in recipes using Now I'm not entirely sure whether rebuilding with |
I have been trying to find some evidence of this (thank you for all of the CircleCI logs), but still am not able to convince myself that the cmake problem was a conda-build 2 issue at all. I have been attempting to reproduce the problem locally, and no matter which conda-build I use, cmake is functional ( |
OK, there is definitely something in the conda-build 2 theory - able to reproduce. Will track any updates in conda-forge/cmake-feedstock#21 to avoid making further noise in this issue. |
This is essentially a solved issue with the exception of one or two stragglers that appear unmaintained at this point. Closing this out. |
The build prefix is going to get longer in
conda-build
2.0.0. ( conda/conda-build#877 ) There has been some discussion about what this will affect and what needs to get rebuilt. I have moved this from a different thread so the discussion can see the light of day. 😄 An excerpt of it is below. Gist is we need to rebuild a few things some of which we know. Some we may not have. These includecurl
,fftw
,pkg-config
, andtk
. Ifswig
ever gets added, we have to watch out for that too. I suspectlibtool
andgit
will also be affected.@msarahan commented on Sun Jun 05 2016
@stuarteberg - 2.0.0beta tagged: https://github.com/conda/conda-build/releases/tag/2.0.0beta
@stuarteberg commented on Sun Jun 05 2016
OHHH yeahhhh....
.... wait, there's no emoji for the kool-aid man? W. T. F.
I'll try to test this out this week. Thanks for the heads-up.
@stuarteberg commented on Mon Jun 06 2016
Whoa now. I clearly wasn't paying enough attention to conda/conda-build#877. :-)
Sounds like the right decision was made. But it will take time to test -- I need to rebuild my entire stack and it will be a few days before I can do that. When do plan on turning the 'beta' release into a real release?
@msarahan commented on Mon Jun 06 2016
Sometime before mid-June. It can stew for a week or two.
On Mon, Jun 6, 2016 at 3:05 PM Stuart Berg notifications@github.com wrote:
@msarahan commented on Mon Jun 06 2016
Please keep me posted on your findings. I'm especially interested in how compatible new packages are with old ones. I think they should be interchangeable, but only the new ones will work on systems with long prefixes.
@stuarteberg commented on Mon Jun 06 2016
My very first attempt at building a package failed immediately (on OS X). The
_build...
prefix was really long, and apparently I have dependency that contains binary files which include the prefix.It looks like many of my recipes include
detect_binary_files_with_prefix: true
, which causes many (all?) of the dylibs to be listed in the package'sinfo/has_prefix
metadata. Was it a mistake to use thatdetect_binary_files_with_prefix
setting in the first place?@msarahan commented on Mon Jun 06 2016
I don't think it was a mistake to use that, but it does imply the long prefix. What were the error messages? Are you able to build something with no dependencies?
@stuarteberg commented on Mon Jun 06 2016
I'm building a package that depends on
fftw
. Myfftw
package is from my own channel (ilastik
- for reference, recipe here: https://github.com/ilastik/ilastik-build-conda/blob/master/fftw/meta.yaml)So while it was creating the
_build
environment, the linking step failed when it tried to "link" fftw:Yes. For example, this recipe builds:
https://github.com/ilastik/ilastik-build-conda/blob/master/lz4/meta.yaml
I'll do some more digging (and thinking) about this later this week. If I have to rebuild my whole stack, that's no big deal. But it would be nice if we can come up with clear guidance for people who run into the same issue I'm seeing.
@msarahan commented on Mon Jun 06 2016
Agreed. Thanks for being my guinea pig.
@jakirkham commented on Mon Jun 06 2016
Well, the placeholder size changed in 2.0.0beta. Not sure if you caught that or not, @stuarteberg, but that could be causing you some pain.
@msarahan commented on Tue Jun 07 2016
@stuarteberg before you get too far, I'm going to tag a 1.21.0 release that has everything but the prefix length change. We are working on a new Anaconda release, and this change is simply too disruptive so close to a release.
Is it true to say that new builds can not use old builds, but old builds can use new builds?
@stuarteberg commented on Tue Jun 07 2016
Sounds like a good idea.
I think that's right, if I understand the problem correctly.
@stuarteberg commented on Tue Jun 07 2016
Continuing to investigate the example from above (my problematic
fftw
package), here's what I see.(Reminder: this package was built with
detect_binary_files_with_prefix: true
)OK, so for some reason all of the
.dylib
files have the prefix embedded in them. But, wait, they use relative RPATHs and whatnot. Why do they contain the prefix?Here's an ugly command that identifies the files containing
_build
and prints out the guiltystrings
:OK, so apparently the
gcc
command is stored within the binaries for some reason? BTW, I checked on Linux, and it's the same. Not sure why this is the case. Do you know?@msarahan commented on Tue Jun 07 2016
I don't know, but I guess I get to learn.
@msarahan commented on Tue Jun 07 2016
1.21.0beta tagged: https://github.com/conda/conda-build/releases
@stuarteberg commented on Tue Jun 07 2016
It might not be worth learning: Looking through my packages, there are several
dylib
files that DO include the_build_placeholder...
prefix, but for different reasons (i.e. not thegcc
commands as shown above). I don't know if those uses of it are important (I suspect most aren't), but it's probably not worth investigating each one.Even so, I'm attempting to get some explanation for the embedded build commands, just for curiosity's sake:
http://stackoverflow.com/questions/37684320/what-causes-a-compiled-library-to-store-its-build-command-internally
@stuarteberg commented on Tue Jun 07 2016
OK, I did some more digging (with the help of a stackoverflow user), and it turns out that
fftw
is the only package on my machine that includes it's own build command in the binary itself. (For the record, it's in a variable namedFFTW_CC
, which makes it's way into the binary by way ofapi/version.c
)Anyway, whatever, it doesn't matter. I think there's no way around it: People whose
detect_binary_files_with_prefix: true
may have to rebuild some of their packages.The good news is that -- as far as I can tell -- this applies to very few of the packages in the default
anaconda
distribution.But there are other packages from the
defaults
channel that will need updating, such as:... and probably more.
@msarahan commented on Tue Jun 07 2016
Good to know. Thanks! Should we be trying to clear that information? Any idea why people put it there? Posterity's sake?
@stuarteberg commented on Tue Jun 07 2016
I don't see the harm in leaving it there. Does conda even provide fftw? I don't think it does. This was in one of my own packages. I guess now that you guys ship
mklfft
, there's no need for me to usefftw
anyway...@jakirkham commented on Tue Jun 07 2016
conda-forge does.
@stuarteberg commented on Tue Jun 07 2016
Then make sure you rebuild it when conda-build 2.0 comes out! ;-)
@jakirkham commented on Tue Jun 07 2016
Duly noted.
So,
curl
hardcodes the path to where the certificates live. I'm guessing old versions ofcurl
used these from the system. So they were not affected by the prefix length during build time. Newer versions ofcurl
(fromdefaults
) use certificates that are provided in theopenssl
package. So, they are affected by the prefix during build time. At conda-forge we have a separate certificates package that ends up living in the same location as where theopenssl
one fromdefaults
place them for compatibility reasons. So, it will probably be affected. Though we already knew about this as we had to fix it before. 😄Not sure where this hardcodes things. Did you see this anywhere in it? Probably points to
$PREFIX/lib/tkX.Y/
and$PREFIX/lib/tclX.Y/
.Not surprised. Any chance we could get a rebuild of
gcc
before thatconda-build
release, @msarahan? I think only a few packages are affected by this (ones using Fortran or OpenMP), but we should probably get the compiler fixed for them.Have not inspected this, but I'm guessing it hardcodes the path to
$PREFIX/lib/pkgconfig
somewhere. Hence unsurprising it needs a rebuild.Surprised you didn't mention
libtool
, which adds.la
files to$PREFIX/lib/
. Would figure it has the same problem.I don't really use SWIG. So, I trust your judgement. Probably hardcoding some path to some provided
.i
files.@jakirkham commented on Tue Jun 07 2016
Did you try
git
? I think that might be affected too. At least, I suspect ours will be. Guessingdefaults
is similar.@jakirkham commented on Tue Jun 07 2016
Alright, this conversation needs to see the light of day (not a closed unrelated PR), I'm going to try using ZenHub's move issue feature, but this could go horribly wrong. So, please fasten your seatbelts. 😁
The text was updated successfully, but these errors were encountered: