Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad haddocks for lvish package #119

Closed
cartazio opened this issue Oct 2, 2013 · 36 comments
Closed

bad haddocks for lvish package #119

cartazio opened this issue Oct 2, 2013 · 36 comments

Comments

@cartazio
Copy link
Contributor

cartazio commented Oct 2, 2013

http://hackage.haskell.org/package/lvish
doesn't have valid links to the listed modules

also

[15:48:59] <carter>  dcoutts: btw the lvish package on hackage has bad links http://hackage.haskell.org/package/lvish
[15:50:19] <dcoutts>     carter: interesting, it contains the vector-0.10.9.1 docs
[15:50:33] <carter>  how so?
[15:50:41] <dcoutts>     http://hackage.haskell.org/package/lvish-1.0/docs/doc-index-33.html
[15:50:46] <carter>  just noticed a message bout it on cafe
[15:50:52] <carter>  figured i'd bring it to your attention
[15:50:57] <carter>  was that a doc builder
[15:51:02] <carter>  or was it uploaded?
[15:51:02] <dcoutts>     please file a ticket
[15:51:05] <dcoutts>     I'm not sure
@dcoutts
Copy link
Contributor

dcoutts commented Oct 2, 2013

http://hackage.haskell.org/package/lvish-1.0/docs.tar contains all kinds of nonsense. It looks like a merge of several packages' tar files. Indeed it looks like the merge of all the docs of the direct dependencies...

It looks like it was the build bot that uploaded it:

127.0.0.1 - [02/Oct/2013:04:52:36 +0000] "PUT /package/lvish-1.0/docs 1.1" 204 -1 "" "hackage-mirror/0.3"

Note the build client is using the same http utils as the hackage-mirror, so it claims to be the same user agent, which is not ideal.

@edsko I think we should look into this, see what's going on.

@lkuper
Copy link

lkuper commented Oct 3, 2013

@cartazio Thanks for filing this. We don't know what's wrong -- our locally built version seems fine.

@neilccbrown
Copy link

I believe I am seeing the same problem with the package I just uploaded: http://hackage.haskell.org/package/mysql-simple-quasi-1.0.0.1/docs/ It seems to be a set of related packages.

I believe the problem is probably that my package and lvish both have a test-suite block as the last item in the cabal file. It looks like it might be using the dependencies from the test-suite to try and build the docs, then somehow getting muddled and just spitting out a bunch of docs from other packages. I might release 1.0.0.2 without the test-suite to see if that fixes the problem.

Edit: I tried commenting it out and releasing 1.0.0.2, but I see the same problem. Either it's reading the commented portion, or the problem comes from a different issue.

@cartazio
Copy link
Contributor Author

cartazio commented Oct 8, 2013

if you can isolate the problem, please! (if you can isolate i'm happy to help figure out a patch, though i'm wildly unfamiliar with the hackage-server codebase). This could be a growing problem if not addressed soon.

@komadori
Copy link
Contributor

komadori commented Oct 8, 2013

I've also experienced this problem with the hsqml-0.2.0.0 package, which also has a test suite. I was able to work around the problem by creating the documentation locally using standalone-hackage and uploading it using curl. For example:-

$ tar --format=ustar -c hsqml-0.2.0.0-docs >docs.tar
$ curl -u username:password -H "content-type: application/x-tar" -T docs.tar http://hackage.haskell.org/package/hsqml-0.2.0.0/docs.tar

@cartazio
Copy link
Contributor Author

cartazio commented Oct 8, 2013

@komadori yes, a number of people are doing that, but that doesn't solve the root problem!
If you or someone else could isolate what is triggering the problem, we can get to solving it! please :)

@neilccbrown
Copy link

Is there a way to test it other than uploading a new version to Hackage? I don't really want to release versions 1.0.0.3 through 1.0.0.20 just for testing this.

@cartazio
Copy link
Contributor Author

cartazio commented Oct 9, 2013

you coould setup your own local hackage mirror?

@cartazio
Copy link
Contributor Author

cartazio commented Oct 9, 2013

though maybe @dcoutts should chime in about what can be done

@cartazio
Copy link
Contributor Author

cartazio commented Oct 9, 2013

i spoke with @dcoutts

[01:32:01] <dcoutts_>  a good place to start would be to try running the doc builder themselves
[01:32:12] <dcoutts_>    the doc builder is an exe in the hackage-server package
[01:32:17] <carter>  ok
[01:32:28] <carter>  so building the hackage package will build the doc builder executable
[01:32:31] <dcoutts_>    that should help them reproduce some issues
[01:32:40] <carter>  ok cool
[01:32:47] <carter>  and that solves the "trial uploading pain"
[01:32:56] <dcoutts_>    obviously it will not help with the cases where the docs fail to build due to missing C libs
[01:33:05] <carter>  yeah
[01:33:09] <carter>  thats not the issue right now though
[01:33:24] <carter>  thats a well understood problem "you lack deps"
[01:33:34] <carter>  isn't there an api for uplaoding self built docs too?
[01:33:45] <dcoutts_>    oh and there's a ticket about providing proper instructions to authors for building docs themselves
[01:33:50] <dcoutts_>    that'd be a good one to fix

so if you wish to help debug the doc builders, you can install hackage-server locally, and experiment with the doc builder binary it comes with

the other ticket Duncan was alluding to is #56 (which walks through how to upload a locally built haddocks). NB: I don't think that addresses the building the haddocks so they link to other packages properly... but I could be wrong

@neilccbrown
Copy link

Installing hackage-build was a bit of a mission -- e.g. the 0.4 release on Hackage is missing a source file, so I built from the latest git version. Here's what is happening on my system.

hackage-build gets asked to install mysql-simple-quasi-1.0.0.2 (for example). It then invokes cabal install for that package, with the argument --docdir=/some/dir/build-cache/inst/share/doc/mysql-simple-quasi-1.0.0.2. That means any and all dependencies that it needs to install also put their documentation into that directory. On my system, that meant 20M of documentation got put into that directory.

Once the build bot is finished, it then attempts to upload all the documentation to hackage, at which point it received a HTTP 413 code: request entity too large. It seems that this means that about half of the HTML files end up getting uploaded, but not the one file that is actually related to the package in question, hence the docs appear missing.

You can see that all this documentation for dependencies is being uploaded for every package. E.g. check out the docs directory for this package that did successfully get docs generated: http://hackage.haskell.org/package/groundhog-th-0.4.0.2/docs/ It's got all its dependencies' docs too, presumably it's just that they are few enough that the upload succeeds before the 413, whereas my package (and other broken ones) have more dependencies and hit the limit.

An obvious fix is to somehow increase the size limit that the server will accept. A more thorough (but possibly more correct?) solution might be to stop cabal install putting all the dependencies' docs into that same directory. Or is there another problem here, that ideally the build bot should already have all the dependencies installed (i.e. it should hold all of hackage installed already) and thus should only be generating the docs for the latest package?

@cartazio
Copy link
Contributor Author

huh! this is great work!

I think you're absolutely right about what the "quick fix" and "long term" fixes should respectively be. (or at least those seem like reasonable proposals)

@lkuper
Copy link

lkuper commented Oct 10, 2013

@twistedsquare Thanks for the awesome detective work!

@komadori
Copy link
Contributor

Regarding the long-term fix,

I've noticed that the docs builder seems to only work starting from a clean environment each time because it deletes the files afterwards but doesn't remove the packages it just built from the GHC database. Actually, I'm not sure that the builder should keep all the dependencies installed because I don't think you can trust anything the builder has access to once it's built untrusted code. That's aside from the issue of what would happen if you tried to install every single package on Hackage into one GHC database. It seems like this would require a system where the builder exported a build.tar for each package and then used these to reconstruct an environment with the right dependencies when building each package, but that's very complicated.

It seems a little difficult to control what Haddock is doing as it stands because the builder runs it indirectly via "cabal install --enable-documentation". I have two potential solutions:-

i) The builder could build the documentation again outside of the "cabal install" command by running "cabal haddock" afterwards. That would just rebuild the documentation for the current package and it could write the output to a different location so that it wouldn't be mixed in with the dependencies.
ii) I noticed that the doc-index.html in the docs.tar only contains information for the last package that was built. The links to symbols in other packages correctly point to their docs directories on Hackage rather than the surplus files in the tar balls, so if you transitively follow the links from doc-index.html, restricting to files within the tar ball, then you will find the desired set of files. You can test this by, for example, running " wget -np -r -l inf http://hackage.haskell.org/package/mwc-random-0.13.1.0/docs/doc-index.html" and comparing this to the contents of docs.tar.

Thoughts? I will have a go at writing a patch. The first approach seems best, assuming it works as I imagine.

@neilccbrown
Copy link

I'm no expert on hackage, but solution 1 from @komadori seems pretty sensible. Do we know if there is a reason this wasn't done in the first place, or if it was just an oversight?

@dcoutts
Copy link
Contributor

dcoutts commented Oct 11, 2013

Thanks folks. The doc builder should only be generating docs for the target package, not all of its deps.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 11, 2013

@komadori Also, the thing with the HTTP 413, is it then really uploading the prefix of the tarball file, rather than rejecting it? That's very bad, if it's too large it should reject outright. Did you reproduce it doing that?

@neilccbrown
Copy link

The 413 might have been a bit of a leap. Separately, I observed:

  1. My package had some of its dependencies' docs uploaded but not all the dependencies, and not the original package. This was after the original build bot had run but before I did anything extra.
  2. When I ran the build bot locally, it correctly generated all the docs. I fed it the Hackage URL and my username and password. It seemed to make a valid attempt at uploading the docs and received a 413 (this is spit out by running the build bot with -vvv). On Hackage, there was no subsequent difference to the docs from step 1.

So I don't know that the server had half the docs because it uploaded half the tar, that was just my best guess based on 1 and 2. If you know how I could verify whether the 413 resulting in uploading half the docs, I still have everything installed and set up.

@komadori
Copy link
Contributor

@dcoutts I created a large 100MB docs tar-ball and was able to successfully upload it to my local Hackage instance. However, on closer inspection, the 413 error reported from the public Hackage server is actually generated by nginx rather than the Happstack application.

I thought at first that nginx might be cutting off the connection midway in response to its own limits without the Hackage server noticing. The uploadDocumentation function operates on a lazy ByteString so it would appear to lack explicit checking for errors from the network layer. However, it does use the Codec.Archive.Tar package to validate the tar-ball's format and, after some experimentation, I think the truncation check in the library would always catch this. Even if you cut a tar file at a file record boundary, the missing two zero records at the end of the file will cause Hackage to reject it.

Not only that, but the bad docs.tar files associated with each of the packages reported on this bug are quite different sizes. They're clearly not being cut off at the same point, if that's what were happening.

I can't clearly account for @twistedsquare's report since I haven't tried hackage-build against the public server. Just a wild guess, but if the official hackage-build connected directly to hackage-server rather than go through nginx then it could successfully upload its bad docs in the first instance whereas @twistedsquare still couldn't upload his merely large (but maybe not bad) docs when using hackage-build connecting to nginx. That doesn't explain where the bad docs come from though. Can you say what the request size limit is on the server?

@komadori
Copy link
Contributor

Actually, I've just realised there's an even simpler way to see that file truncation isn't the cause of this. Recalling that I remarked in my "long-term fix" comment that doc-index.html contains only the symbols for the last package that was built, I just checked and that's true for packages with working documentation but not for the ones that are broken. In the broken docs.tar files I looked at, the doc-index.html files contains one of the dependencies' symbols. That suggests that the real issue is that cabal-install is failing before it gets to the target package without hackage-build noticing. Are the package build logs retained anywhere?

@komadori
Copy link
Contributor

The fact that the documentation tar-ball was bad distracted me from the fact I should have been surprised to get documentation built automatically in the first place. My affected package, hsqml, depends on the Qt libraries and so it's completely understandable for the build machine not to have them and hence for the docs build to fail (although I was previously lucky they seemed to be there on the old Hackage build machine). Likewise, @twistedsquare's mysql-simple-quasi package depends indirectly on the MySQL client libraries and that probably caused its docs build to fail. It's less clear what went wrong with the lvish package, but it also has several dependencies which include C code and they are potential causes of a build failure.

The code in BuildClient.hs doesn't properly check that the target package built successfully. It simply checks that the doc-index.html file exists, hence as long as one the dependencies builds successfully and creates this file then the docs builder will upload the docs.tar with however far it got. The comments give a rationale for ignoring the return code of cabal-install which may be legitimate, but perhaps this could be used in addition to the other checks? Are the cases where cabal-install fails, but the documentation is good?

Ignoring that for the moment, I've created a patch which extends the existing checks to check for the package-name.haddock file in the output. This fixes the problem in my tests with some dummy packages insofar as the docs builder will stop uploading bad docs.

Please wait while I figure out how to create a pull request :-).

dcoutts added a commit that referenced this issue Oct 22, 2013
Check that haddock docs build completed before uploading (fixes #119).
@lkuper
Copy link

lkuper commented Oct 23, 2013

Sorry to be the bearer of bad news, but we just uploaded a new version of the package (http://hackage.haskell.org/package/lvish-1.0.0.6) and we're still seeing the same problem. (cc: @rrnewton @aturon)

@cartazio
Copy link
Contributor Author

@lkuper i strongly suspect they've not deployed the updated hackage tooling yet

@cartazio
Copy link
Contributor Author

so pester them to tell you when the update has happened and try again

@dcoutts
Copy link
Contributor

dcoutts commented Nov 27, 2013

New doc builder is currently churning through a backlog. When it's done, one can fix these packages by deleting their docs and letting them be rebuilt.

@dcoutts dcoutts reopened this Nov 27, 2013
@dcoutts
Copy link
Contributor

dcoutts commented Nov 28, 2013

Note that this will take some time, the doc builder has 7k packages left to look at, and at the current rate that'll take it about 3.5 days.

@rrnewton
Copy link
Member

rrnewton commented Dec 2, 2013

Sorry, for someone who hasn't done this before what is the recipe for deleting these docs? Is it by using this server api?
http://hackage.haskell.org/api#documentation-core

Update, for anyone else who needs it, this seemed to work:

curl -u username -XDELETE http://hackage.haskell.org/package/lvish-1.0.0.6/docs/

Now to wait and see if the doc builder rebuilds it properly!

@dcoutts
Copy link
Contributor

dcoutts commented Dec 3, 2013

@rrnewton so it failed, see http://hackage.haskell.org/package/lvish-1.0.0.6/reports/1/log

Configuring bits-atomic-0.1.3...
Building bits-atomic-0.1.3...
Preprocessing library bits-atomic-0.1.3...
[1 of 1] Compiling Data.Bits.Atomic ( Data/Bits/Atomic.hs, dist/build/Data/Bits/Atomic.o )

cbits/atomic-bitops-gcc.c:1:0:
     error: CPU you selected does not support x86-64 instruction set
Failed to install bits-atomic-0.1.3

Note that the builder is running on Linux x86-64. You can see this from the build report os = Linux, arch = X86_64

A little further investigation indicates that the bits-atomic package is just borked. For some reason the -march=native option causes it to fail with the error above. When that's fixed there are further ominous warnings:

cbits/atomic-bitops-gcc.c: In function ‘fetch_and_nand_8’:

cbits/atomic-bitops-gcc.c:22:2:
     note: ‘__sync_fetch_and_nand’ changed semantics in GCC 4.4
cbits/atomic-bitops-gcc.c: In function ‘nand_and_fetch_8’:

cbits/atomic-bitops-gcc.c:40:2:
     note: ‘__sync_nand_and_fetch’ changed semantics in GCC 4.4

This is using gcc-4.7.2.

The -march=native thing is a bit peculuiar, but there we go, it's a standard debian 7.2 install.

@cartazio
Copy link
Contributor Author

cartazio commented Dec 3, 2013

relatedly: why isn't the build log linked from the page?

@rrnewton
Copy link
Member

rrnewton commented Dec 4, 2013

Interesting... @gwicke, do you have any ideas?

I have not seen this problem before. It does seem like gcc is being weird. Assuming -march=native is getting to it, it seems like it should support the atomic intrinsics!

Another funny thing is that the bits-atomic library itself seems to be fine on hackage...

@cartazio
Copy link
Contributor Author

cartazio commented Dec 4, 2013

@rrnewton @dcoutts is the docbuilder running in a VM? I believe that sometimes certain VMs don't play nice with -march=native, at least wrt SIMD code gen and associated support. It could be an issue with the vm, OR it would be that ryan's code hasn't been test built on certain VM configs before?

@rrnewton
Copy link
Member

rrnewton commented Dec 5, 2013

@cartazio Note that there is a separate GHC bug that causes atomic-primops to not be loadable by GHCI on linux. (Fixed by Simon Marlow in 7.8.) But this is not that bug, it's a problem with a further-upstream dependency in this package:

http://hackage.haskell.org/package/bits-atomic

I should probably just remove this dependency from atomic-primops; it's only needed for this module:

http://hackage.haskell.org/package/atomic-primops-0.4/docs/Data-Atomics-Counter-Foreign.html

Still, it seems like bits-atomic should not be causing this problem! We can try getting rid of the -march-native in its cabal file I suppose:

http://hackage.haskell.org/package/bits-atomic-0.1.3/bits-atomic.cabal

@cartazio
Copy link
Contributor Author

cartazio commented Dec 5, 2013

I think before we do that fix, we should understand why its broken. else The next time a similar bug happens, we know whats going on.

@rrnewton
Copy link
Member

rrnewton commented Dec 6, 2013

As a test I nuked the documentationfor "atomic-primops" and confirmed that it doesn't rebuild. There were correct docs there before, but they were pre-hackage 2 I guess?

Did the Hackage-1 builder run on a different kind of machine (not a VM?)? Is that why we didn't see the -march-native before? What else changed?

Is there a way to get ssh access to whatever is doing the doc building to poke around some?

@dcoutts
Copy link
Contributor

dcoutts commented Dec 8, 2013

@cartazio @rrnewton I've no idea what machine the old doc builder used, but it's a reasonable guess that it was not a VM, while the new doc builder is indeed running in a VM.

@rrnewton Have a look at http://lpaste.net/96763
The odd thing is that is the line:

  -march=                           pentium-m

There's nothing particular special about this VM, it's just kvm for the VM with debian 7.2

$ uname -a
Linux hackage 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux

So if it's a problem here, it's likely to be a problem elsewhere too.

@dcoutts
Copy link
Contributor

dcoutts commented Dec 8, 2013

BTW, I think this ticket is now drifting off into other issues. I think the original problem is solved, and we can open tickets for any related subsequent problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants