Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove git submodules #10743

Merged
merged 3 commits into from
Apr 14, 2015
Merged

remove git submodules #10743

merged 3 commits into from
Apr 14, 2015

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Apr 5, 2015

I know ya'll probably thought I would be the last person to be submitting a PR to kill of the usage of git submodules. but with this little bit of sleight of hand, I'm trying to have my cake and eat it too.

If you run make NO_GIT=1, the build doesn't need git at all, doesn't need it installed, nothing. So if you download a zip or tarball from GitHub, it'll even set NO_GIT automatically and keep on chugging.

But if you run make like usual, this does use git to manage certain dependencies. Additionally, it uses shallow checkouts to keep checkout time and space down, but those are easy enough to convert to full checkouts if required. This lets us have the benefits of git submodules, without the downsides of git submodules. I call this new feature "git-externals".

Both usages (NO_GIT=1 and NO_GIT=0) will even live happily side-by-side in the same build tree (so that make source-dist is simpler). But wait, there's more: git-externals modules are also fully compatible (forwards and backwards) with their pre-existing git-submodules (although the forward move will force a rebuild).

ref #10730

@tkelman
Copy link
Contributor

tkelman commented Apr 5, 2015

Could the .version files be removed and just have that content as part of deps/Versions.make ?

@@ -871,6 +871,8 @@ ifeq ($(VERBOSE), 0)

QUIET_MAKE = -s

WARNCOLOR="\033[33;1m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be defined when VERBOSE = 1 too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. thanks

@tkelman
Copy link
Contributor

tkelman commented Apr 5, 2015

Aside from the comments above, I think I like it. Would have to try it out for a while to decide for sure, but looks like a clever solution.

@tkelman tkelman added the building Build system, or building Julia or its dependencies label Apr 5, 2015
@vtjnash
Copy link
Member Author

vtjnash commented Apr 5, 2015

Could the .version files be removed and just have that content as part of deps/Versions.make ?

no. i specifically want (need) these files to be versioned separately. that allows the rules to force a rebuild when the version spec changes.

@tkelman
Copy link
Contributor

tkelman commented Apr 5, 2015

Technically they could depend on Versions.make, would that force an entire recompilation when any other dependency version changed? Or just make it re-run configure? We don't touch Versions.make all that often, might be nice to save a little bit of clutter but separate files isn't that bad.

@tkelman
Copy link
Contributor

tkelman commented Apr 5, 2015

I think this'll fail to build with NO_GIT = 1 if you don't have pre-existing copies of libuv (or openlibm on Windows) sitting around, because of some hard-coded include paths. The libuv one should be fixable by referring to the installed header, adding something like -I$(build_includedir) if necessary.

@ViralBShah
Copy link
Member

I was going to propose it, but thought it would be too complex. Thanks @vtjnash for just going ahead and doing it.

@ivarne
Copy link
Member

ivarne commented Apr 5, 2015

So essentially this is git submodules the way they should work, implemented in a make script?

That's really great!!

@tkelman
Copy link
Contributor

tkelman commented Apr 6, 2015

I would be wary of relying on too much cleverness from gmake here, except that CMake has its own ExternalProject mechanism which should work robustly for this if we ever get further in that direction.

At the moment the fact that this is using https://github.com/*/*/tarball/* for the TAR_URLs rather than /archive/ as we use elsewhere is actually a good thing. The caching server that we use in jldownload has "github.com/[^/]+/[^/]+/archive" in its whitelist. We may need to re-examine that, add a blacklist, or try to be consistent in not using cached downloads for branch url's where the content will change over time. Is there a quick way of determining whether a requested /archive/ url is a tag vs a branch? Or should we maybe do some periodic server-side checksumming to invalidate stale cached files? cc @staticfloat

@staticfloat
Copy link
Member

I think we should always be using tag names or direct SHA references. That
way there's no caching ambiguity, and it really does work just like the old
got submodules.

On Mon, Apr 6, 2015, 09:50 Tony Kelman notifications@github.com wrote:

I would be wary of relying on too much cleverness from gmake here, except
that CMake has its own ExternalProject mechanism which should work robustly
for this if we ever get further in that direction.

At the moment the fact that this is using https://github.com/*/*/tarball/*
for the TAR_URLs rather than /archive/ as we use elsewhere is actually a
good thing. The caching server
https://github.com/staticfloat/cache.julialang.org that we use in
jldownload has "github.com/[ http://github.com/%5B^/]+/[^/]+/archive"
in its whitelist. We may need to re-examine that, add a blacklist, or try
to be consistent in not using cached downloads for branch url's where the
content will change over time. Is there a quick way of determining whether
a requested /archive/ url is a tag vs a branch? Or should we maybe do
some periodic server-side checksumming to invalidate stale cached files? cc
@staticfloat https://github.com/staticfloat


Reply to this email directly or view it on GitHub
#10743 (comment).

@vtjnash
Copy link
Member Author

vtjnash commented Apr 6, 2015

I would be wary of relying on too much cleverness from gmake here, except that CMake has its own ExternalProject mechanism which should work robustly for this if we ever get further in that direction.

gmake provides the framework, but very little cleverness of it's own. it's all actually just using pretty simple rules, encoded in the basic tools provided by gmake. I'm personally no longer interested in switching to CMake, once we merged libgit2 and I discovered how inadequate it's support of cross-compiling actually is. That just seems like such a shockingly poor design decision.

I think we should always be using tag names or direct SHA1 references...just like git submodules

tags can move. this moves away from depending on SHA1 hashes quite as directly, specifically because it seems to cause users such pain. It looks like git returns an ETag that is actually just the SHA1 hash of that commit, so you can validate the cache just by checking the header of the reply (after all of the redirects):

$ openssl s_client -connect codeload.github.com:443
HEAD /JuliaLang/julia/tar.gz/v0.3.7 HTTP/1.1
Host: codeload.github.com
If-Modified-Since: Mon, 06 Apr 2015 02:14:11 GMT
If-None-Match: "cb9bcae93a32b42cec02585c387396ff11836aed"

HTTP/1.1 200 OK
Content-Length: 2355052
Access-Control-Allow-Origin: https://render.githubusercontent.com
Content-Security-Policy: default-src 'none'
X-XSS-Protection: 1; mode=block
X-Frame-Options: deny
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
ETag: "cb9bcae93a32b42cec02585c387396ff11836aed"
Content-Type: application/x-gzip
Content-Disposition: attachment; filename=julia-0.3.7.tar.gz
Date: Mon, 06 Apr 2015 02:32:18 GMT

@tkelman
Copy link
Contributor

tkelman commented Apr 6, 2015

I'm personally no longer interested in switching to CMake, once we merged libgit2 and I discovered how inadequate it's support of cross-compiling actually is. That just seems like such a shockingly poor design decision.

Not really the topic of this issue, but do you ever want first-class support for MSVC? Maintaining 2 independent build systems is painful and not a good idea long-term. CMake is perfectly capable of cross-compiling - with a toolchain file. We just haven't gone and written the toolchain files.

edit: also note that LLVM is moving very strongly towards removing support for autotools altogether, so CMake will be absolutely unavoidable before all that much longer.

@vtjnash
Copy link
Member Author

vtjnash commented Apr 6, 2015

edit: also note that LLVM is moving very strongly towards removing support for autotools altogether, so CMake will be absolutely unavoidable before all that much longer.

that would in some ways be a huge pain for us. even the llvm page on "how to cross compile llvm" notes that cmake isn't able to correctly detect the dependencies of a cross-compile and needs to be manually tricked: http://llvm.org/docs/HowToCrossCompileLLVM.html

can you imagine how long it will take to create each cross-build configuration if you have to write all of the rules manually? it took me barely an afternoon to turn on cross-compiling to any arbitrary target for 8 autotools dependencies. it's unclear to me how to replicate that result for just one cmake target (libgit2). what's the point of having a supposed automated configuration system if I have to write all of the rules manually?

@tkelman
Copy link
Contributor

tkelman commented Apr 6, 2015

They're missing CMAKE_FIND_ROOT_PATH_MODE_* flags, which you put in the toolchain files. Aside from the prefix the toolchain files can mostly be copies of one another. The dependencies issue is partly because debian's not as good at helping you easily install cross-arch dependencies for cross-compiling purposes as fedora or opensuse.

I believe this will still be broken on Windows with NO_GIT = 1 due to the openlibm hard-coding.

@tkelman
Copy link
Contributor

tkelman commented Apr 6, 2015

LIBUV_INC and UTF8PROC_INC in Make.inc should probably be replaced by $(build_includedir) for the non-USE_SYSTEM_ case. I may as well fix the assembly thing now by adding ENTRY.h and END.h files in src/support. We can move them over to openlibm in the future if we ever want to translate the assembly there to intel syntax and try building it with MSVC.

@tkelman
Copy link
Contributor

tkelman commented Apr 8, 2015

Now that staticfloat/cache.julialang.org#3 is resolved this should probably switch to using /archive/ rather than /tarball/ for github url's, for consistency's sake.

@vtjnash
Copy link
Member Author

vtjnash commented Apr 8, 2015

I was attempting to follow the github API doc: https://developer.github.com/v3/repos/contents/

@tkelman
Copy link
Contributor

tkelman commented Apr 8, 2015

Ah, well if that's current, strange that the download links through the web ui aren't consistent with their api. We could alternately add /tarball/ to the caching whitelist.

@vtjnash
Copy link
Member Author

vtjnash commented Apr 13, 2015

Ah, well if that's current, strange that the download links through the web ui aren't consistent with their api. We could alternately add /tarball/ to the caching whitelist.

turns on that they aren't quite the same. internally, the API link encodes the SHA1 hash and owner into the filepath, while the archive link instead encodes the branch name:

drwxrwxr-x root/root         0 2015-02-12 01:48 JuliaLang-Rmath-julia-04a4298/
-rw-rw-r-- root/root       139 2015-02-12 01:48 JuliaLang-Rmath-julia-04a4298/.gitignore
-rw-rw-r-- root/root     18002 2015-02-12 01:48 JuliaLang-Rmath-julia-04a4298/COPYING

vs.

drwxrwxr-x root/root         0 2015-02-12 01:48 Rmath-julia-0.1/
-rw-rw-r-- root/root       139 2015-02-12 01:48 Rmath-julia-0.1/.gitignore
-rw-rw-r-- root/root     18002 2015-02-12 01:48 Rmath-julia-0.1/COPYING
-rw-rw-r-- root/root       631 2015-02-12 01:48 Rmath-julia-0.1/Make.inc

any thoughts or preference here?

vtjnash added 3 commits April 13, 2015 17:54
git submodules cause developers a lot of heartache,
but are very useful when used correctly.
this combines the best features of git submodules,
with the best features of static tarball files
(and even automatically switches between the two),
 in a new feature that I'm dubbing "git external".

the main idea here is to act a bit more like svn external
(hence the name), in that this tracks a branch (or tag)
rather than a SHA1 hash.
In a nod to git submodules, however, this explicitly also remembers the
SHA1 hash of the original reference and gives a warning if they diverge.
This helps make two problem situations less likely:
  1) Pointing to a SHA1 hash that is not on an upstream branch,
     and thus later gets deleted by the git gc
  2) Forgetting to push in the correct order
     (submodule and then toplevel),
     and instantly breaking the build for everyone else

because the branch information is now stored in a file,
instead of some hidden git metadata,
this also allows building from a tarball without any extra effort.
in fact, when building from a tarball (with no .git folder),
this automatically downloads the dependency tarball as well,
so that having git installed is not necessary.

all this seamless magic is implemented in the git-external deps/Makefile rule.
see the description above that rule, and other usages, for more details.
@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

#10743 (comment)

edit: in case this isn't clear, this is broken with NO_GIT = 1

@@ -0,0 +1 @@
Subproject commit 86447ad060d6f4edf01f2a64b9598dfeeb6e6f7d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git submodules coming back for revenge?

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

buildbots are apparently broken by this, cc @staticfloat

@vtjnash
Copy link
Member Author

vtjnash commented Apr 14, 2015

ci seems OK with this PR. the i686 build failed after merging because the download of virtualenv failed

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

Not CI. Binaries. http://buildbot.e.ip.saba.us:8010/builders/

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

As far as /tarball/ vs /archive/, I don't think it matters too much which we use (though as you saw the checksums are different, hopefully only due to the different top-level folder name in the tarball?), but I would strongly prefer we be consistent and pick the same thing everywhere. We still have quite a few [edit: only 3, apparently] dependencies that are downloading tarballs from github which haven't been changed to "git-externals," though I suppose they could be.

I'm a bit concerned having the checksums only verified when building with NO_GIT will mean they're likely to go stale if people forget to update them.

@staticfloat
Copy link
Member

staticfloat commented Apr 14, 2015 via email

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

Looks mostly back to normal, apparently the mis-rebased submodule remnant was confusing git on the buildbots.

@tkelman
Copy link
Contributor

tkelman commented May 30, 2015

Looking at this again, tracking a sha directly is better than tracking a branch, since the latter leads to non-reproducible builds that depend on when you try to run them, and can cause indirect breakage here if a broken commit gets pushed to a dependency. The sha mismatch should be an error, not a warning. Or it should do a checkout, which would error when dirty.

@vtjnash
Copy link
Member Author

vtjnash commented May 30, 2015

What you are describing is a git submodule (minus tracking version information in a separate file, plus actual integration with git).

But to your other point, whenever possible, these should probably typically be tracking tags not branches for exactly the reason you mention.

@tkelman
Copy link
Contributor

tkelman commented Jun 1, 2015

Note that of the many complaints I've made about git submodules, none of them were that they use a sha instead of a branch. With recent versions of git, submodules can track branches. But they shouldn't due to the non-reproducibility here.

In my opinion we can't tag a release with this feature in its current form. Otherwise 6 months later if someone tries to build a release tag, they most likely won't be able to if any dependencies have changed.

Also a note to self that we should add comments to these .version files reminding anyone who updates a dependency version to also do a build with NO_GIT=1 and update the deps/checksums files.

@vtjnash
Copy link
Member Author

vtjnash commented Jun 1, 2015

I was going to going to suggest the reverse and say that we should not be adding the checksum files. The SHA1 hash is already a checksum for git. Otherwise, "make source-dist" will create them.

@tkelman
Copy link
Contributor

tkelman commented Jun 1, 2015

The only reason I liked this version more than submodules is because it allows proper operation from tarballs with NO_GIT - we still want to checksum those tarballs. One of submodules' flaws was not being able to operate without git, making the jobs of distribution packagers harder since you can't arbitrarily clone git repositories from the internet on those buildbots, but you can easily place a tarball in the correct location.

edit:

The SHA1 hash is already a checksum for git.

Currently advisory-only. That's bad. Warnings get ignored. I won't, but to prove a point I (or anyone else) could go push a broken commit to openspecfun master right now and it would break the Julia build for everyone doing a fresh clone.

@StefanKarpinski
Copy link
Member

Also SHA1 is no longer considered secure. Tarball checking uses SHA256 which is secure (also MD5, which is totally insecure at this point, but just as a fallback).

@tkelman tkelman mentioned this pull request Jun 2, 2015
13 tasks
@tkelman
Copy link
Contributor

tkelman commented Jun 2, 2015

I'm less concerned about cryptographic security than I am about actually getting the code we say we want to get. Asking for a branch is not the right choice here. A tag is better, but not perfect, and we'd need to be rigorous about enforcing tags rather than branches. I think the sha1 needs to be mandatory here. The minimal way of doing that might be with just a checkout on the git side, but the tarball side needs to adjust the url it's downloading from otherwise it'll fail the checksum verification if the referenced branch changes.

@vtjnash
Copy link
Member Author

vtjnash commented Jun 2, 2015

i forgot about the checksum verification case of the moving head case. it's an easy change to just put the SHA1 hash twice (in the _BRANCH field), and have the tarball download use the _SHA1 hash value rather than the _BRANCH variable.

@tkelman
Copy link
Contributor

tkelman commented Jun 2, 2015

to just put the SHA1 hash twice

Can you explain what you mean by this, in more words or a diff?

@vtjnash
Copy link
Member Author

vtjnash commented Jun 2, 2015

like so:

diff --git a/deps/openblas.version b/deps/openblas.version
index 5f67252..e893343 100644
--- a/deps/openblas.version
+++ b/deps/openblas.version
@@ -1,2 +1,2 @@
-OPENBLAS_BRANCH=v0.2.14
+OPENBLAS_BRANCH=d0c51c4de91b2356b770f92862c6b0e1d54953cd
 OPENBLAS_SHA1=d0c51c4de91b2356b770f92862c6b0e1d54953cd
diff --git a/deps/Makefile b/deps/Makefile
index 5a8862b..aef765c 100644
--- a/deps/Makefile
+++ b/deps/Makefile
@@ -261,9 +261,9 @@ ifeq (exists, $$(shell [ -d $$(JULIAHOME)/.git/modules/deps/$1 ] && echo exists
 $1/$4: $$(JULIAHOME)/.git/modules/deps/$1/HEAD
 endif
 else # NO_GIT
-$2_SRC_DIR = $1-$$($2_BRANCH)
+$2_SRC_DIR = $1-$$($2_SHA1)
 $$($2_SRC_DIR).tar.gz:
-   $$(JLDOWNLOAD) $$@ $$(call $2_TAR_URL,$$($2_BRANCH))
+   $$(JLDOWNLOAD) $$@ $$(call $2_TAR_URL,$$($2_SHA1))
 $$($2_SRC_DIR)/$3: $$($2_SRC_DIR).tar.gz
    $$(JLCHECKSUM) $$($2_SRC_DIR).tar.gz
    mkdir -p $$($2_SRC_DIR) && \

@tkelman
Copy link
Contributor

tkelman commented Jun 2, 2015

I like the latter change, but if we're going to do the former, why have a separate *_BRANCH variable at all?

@vtjnash
Copy link
Member Author

vtjnash commented Jun 2, 2015

just avoiding deleting code (minimizing the size of the diff).

tkelman added a commit that referenced this pull request Jun 3, 2015
This makes it less likely that we would clone a branch that has diverged from
the commit sha1 specified in the _SHA1 variables in the deps/*.version files

ref #10743 (comment)
tkelman added a commit that referenced this pull request Jun 3, 2015
This makes it less likely that we would clone a branch that has diverged from
the commit sha1 specified in the _SHA1 variables in the deps/*.version files

ref #10743 (comment)
tkelman added a commit that referenced this pull request Jun 4, 2015
This makes it less likely that we would clone a branch that has diverged from
the commit sha1 specified in the _SHA1 variables in the deps/*.version files

ref #10743 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants