Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build profiling libraries in 'haskellPackages' by default? #22340

Closed
peti opened this issue Feb 1, 2017 · 22 comments
Closed

Build profiling libraries in 'haskellPackages' by default? #22340

peti opened this issue Feb 1, 2017 · 22 comments
Labels
0.kind: question Requests for a specific question to be answered 6.topic: haskell 6.topic: policy discussion

Comments

@peti
Copy link
Member

peti commented Feb 1, 2017

Serious Haskell hackers need to profile their code, and that requires special variants of the libraries involved in the build. By default, we don't build those libraries. Should we? If we do, then I see the following options:

  1. Provide separate package sets (e.g. haskell.packages.ghcXYZ and haskell.packages.profiling.ghcXYZ) and selectively enable both variants on hydra.nixos.org for appropriate compiler versions, i.e. for the default compiler that's used to build haskellPackages, too. This means that users get the non-profiled package set by default, but they can switch to the profiled package set if they wish to do so.

    Pros:

    • Users who wish to use profiling can do so.
    • Users who don't profile see no difference, i.e. they don't have to download larger packages or deal with increased closure sizes, etc.

    Cons:

    • We let hydra.nixos.org compile every library 5 times (static, dynamic in non-profiling package set, static, dynamic without profiling plus static with profiling in the profiling package set). This is going to slow down our builds, no doubt, and it's also doing to slow down other, non-Haskell builds.
    • The amount of data our binary cache has to store increases by a factor of 2.5, approximately, because we upload the profiled packages in addition to the non-profiled ones. Hard-linking the identical files in both variants won't work because ghc doesn't produce deterministic binaries across re-builds of the same code.
  2. We can enable profiling by default in our main package set, haskellPackages.

    Pros:

    • Users who want to profile their code can do so without any manual effort at all -- everything just works.

    Cons:

    • Hydra has to compile an additional variant of every library, which will increase the build time by a factor of 1.5, approximately.
    • The disk space and bandwidth requirements on cache.nixos.org increase by a factor of ~1.5, too.
    • We increase the build-time and runtime closure size of every dynamically linked Haskell derivation (that is almost all of them) by a factor of ~1.5 for all users, not just for those who care about profiling. That issue could be mitigated by placing profiling libraries into their own derivation output, but it's not clear whether this can be accomplished easily and the corresponding ticket, Multiple outputs for haskell ecosystem #4504, has made no progress in the last 2.25 years, so it's not obvious why a working implementation of that feature should pop up out of the blue right now.

Opinions?

@peti peti added 0.kind: question Requests for a specific question to be answered 6.topic: haskell 6.topic: policy discussion labels Feb 1, 2017
@peti peti added this to the 17.03 milestone Feb 1, 2017
@peti peti self-assigned this Feb 1, 2017
@fubarbaz
Copy link

fubarbaz commented Feb 1, 2017

This isn't really a community discussion, is it? Whoever is paying for Hydra can decide this unilaterally.

Technically, it would probably be possible to modify GHC such that it would output both profiling and non-profiling in once pass, if anyone wanted to do any engineering on it, that is.

@basvandijk
Copy link
Member

I'm in two minds about this:

At LumiGuide we use option 1. We care about small closure sizes since some of our machines are connected to the internet through mobile links with limited bandwidth.

Although option 2 would be simpler and thus more user friendly which IMHO weighs more heavily than our specific requirements.

So for me it's a +0.1 for option 2 and I hope #4504 will make progress.

@peti
Copy link
Member Author

peti commented Feb 1, 2017

This isn't really a community discussion, is it?

I believe that this is more a question of what the user wants, or rather what we want the user experience to be like. Whether Hydra will cope with the load or not is a subject of speculation. Nobody really knows and we're probably not going to get a definite answer to that question. IMHO, we should chose whatever the best solution is and then try it.

@k0001
Copy link
Contributor

k0001 commented Feb 3, 2017

At Takt we use a mixture of options 1 and 2. For all non-Takt code we use option 2, but then for our own code we use option 1 (in fact we go even further: We have a package set enabling optimizations for our packages, and another package set disabling them). This has worked quite well for us, because we only have to compile our dependencies once and they will satisfy the needs of any option 1 alternative we might want to use for our own code.

With this in mind, I lean towards option 2, as I think it leads to a more rewarding stock experience for users. Those needing smaller closure sizes can tune these things a bit.

By the way, besides setting enableLibraryProfiling and enableExecutableProfiling to true, we are adding this to our configureFlags:

  "--ghc-options=-fprof-auto-calls"                                        
  "--ghc-options=-fprof-auto-exported"                                     
  "--ghc-options=-fprof-auto-top"                                          
  "--ghc-options=-fprof-cafs"        

@shlevy
Copy link
Member

shlevy commented Feb 6, 2017

If we go with 2 plus multiple outputs, we can still get small closures

@ocharles
Copy link
Contributor

ocharles commented Feb 6, 2017

By the way, besides setting enableLibraryProfiling and enableExecutableProfiling to true, we are adding this to our configureFlags:

Your own packages configureFlags, or over the entire package set? If it's the latter, I really wouldn't want that in nixpkgs! Almost everything on Hackage should have no profiling information added unless a flag is turned on. The fact that some packages unilaterally export ghc-prof-options is a bug of those packages - it just poisons profiling information. But if it's just your own stuff, then of course that makes sense (though I'd still suggest using ghc-prof-options in your cabal files, and protecting it with a flag).

@k0001
Copy link
Contributor

k0001 commented Feb 6, 2017

@ocharles I agree with your thoughts. This works well for our use case (although fortunately we don't need to do profiling very often!), but for nixpkgs we probably don't want these as default.

@k0001
Copy link
Contributor

k0001 commented Feb 6, 2017

-fprof-auto-calls seems to be useful for stack-traces, though, which maybe is something worth having by default: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#ghc-flag--fprof-auto-calls

@ocharles
Copy link
Contributor

ocharles commented Feb 6, 2017

Still a "please no" to that. That's a ton of information in +RTS -p output, most of which will be outside my control anyway.

@k0001
Copy link
Contributor

k0001 commented Feb 6, 2017

Hmm... maybe this highlights a problem, then: Seeing as "enabling profile information" can mean different things depending on the flags passed to the compiler, does it make sense to provide a profiled Haskell package set at all if it won't satisfy the profiling needs of most people?

@ocharles
Copy link
Contributor

ocharles commented Feb 6, 2017 via email

@k0001
Copy link
Contributor

k0001 commented Feb 9, 2017

@ocharles I can agree with that reasoning 👍

@peti
Copy link
Member Author

peti commented Feb 28, 2017

It's hard to decide which one solution is best. After some thought, I lean towards enabling profiling library builds by default in the upcoming 17.03 release branch (but not in master). The rationale for that choice is that I assume "end-user types", who benefit the most from this change, to be most likely using the release branch rather than following git master. People who follow git master, on the other hand, I assume to be able to accomplish whatever they need by means of overriding the default configuration.

Having profiling enabled by default gives more convenience to normal users, but it's a disadvantage for power users who want to create their own, minimal installations for production. Again, here I assume that those people know how to override the configuration in order to do exactly what they need.

I'm sure this is not the best solution, but I can't think of any better one.

@shlevy
Copy link
Member

shlevy commented Feb 28, 2017

Eh, I'm 👎 on release branches behaving differently that way. The "normal user" "git master user" distinction is not one we want to make bigger!

@globin globin modified the milestones: 17.09, 17.03 Mar 14, 2017
@peti
Copy link
Member Author

peti commented Mar 27, 2017

OK, I suppose I won't enable profiling builds on hydra.nixos.org at all since there is no obvious solution how to do it in a way that's sufficiently beneficial for our users to warrant the costs such a change would incur. I'll leave the ticket open for now, nut I'll remove the 17.03 label.

@peti peti removed this from the 17.09 milestone Mar 27, 2017
@peti peti removed their assignment Mar 27, 2017
@ocharles
Copy link
Contributor

ocharles commented Apr 5, 2017

I think option 2 is really the way to go for now. The main drawback is really going to be the increased closure size. @basvandijk highlights a concern here due to the link to deployment machines being a bottleneck, but I wonder - do you compile into a static binary? At CircuitHub we have found the smallest closure is to build a statically linked executable and then strip it. In this case, I'm not sure the inflation of profiling libraries would make a difference, and they would become a development-only cost.

@basvandijk
Copy link
Member

basvandijk commented Apr 5, 2017

@ocharles at LumiGuide I recently wrapped all our Haskell executables in justStaticExecutables and I was very happy with the results. For our image analysis server:

  • closure size went from 2490.53 MiB to 1668.72 MiB.
  • output size went from 26.70 MiB to 22.87 MiB.

So I don't have a problem anymore with option 2.

@fosskers
Copy link

If the profiling versions of libs can't be supplied by nixpkgs, I think I've reached a reasonable approach for achieving it manually.

@shlevy
Copy link
Member

shlevy commented Jan 22, 2018

@peti does the recent multiple outputs stuff change things here?

@peti peti closed this as completed in 7f5fba7 Mar 17, 2018
@ocharles
Copy link
Contributor

ocharles commented Mar 17, 2018

This will make a huge difference. Thanks for sorting this peti!

@basvandijk
Copy link
Member

It's very good that we now have profiled libraries available from the cache. No need to build our own profiled libraries anymore. Thanks peti!

@fosskers
Copy link

fosskers commented Mar 17, 2018

Oh interesting. So profiling will "just work" if -prof if specified now? I don't use Nix anymore, so can someone confirm this?
I will need to update this section of my Nix blog post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: question Requests for a specific question to be answered 6.topic: haskell 6.topic: policy discussion
Projects
None yet
Development

No branches or pull requests

8 participants