Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build multiple packages in parallel #440

Closed
bos opened this issue May 24, 2012 · 14 comments
Closed

build multiple packages in parallel #440

bos opened this issue May 24, 2012 · 14 comments

Comments

@bos
Copy link
Contributor

bos commented May 24, 2012

(Imported from Trac #447, reported by @dcoutts on 2009-01-10)

The latest version of the gentoo portage tool is rather slick. It can do parallel builds and it displays a nice summary on the command line, eg:

# emerge -uD system -j --load-average=4.5
Calculating dependencies... done!
>>> Verifying ebuild manifests
>>> Starting parallel fetch
>>> Emerging (1 of 14) dev-libs/expat-2.0.1-r1
>>> Emerging (2 of 14) sys-devel/autoconf-wrapper-6
>>> Emerging (3 of 14) sys-kernel/linux-headers-2.6.27-r2
>>> Installing sys-devel/autoconf-wrapper-6
>>> Jobs: 0 of 14 complete, 1 running  Load avg: 2.99, 1.59, 0.67
Note how they solve the problem of how to display what is going on when there are multiple builds happening. The answer is not to display it at all! This would have to go hand-in-hand with logging all builds so that we can still diagnose failures.

Note the final line, that gets updated to display the current number of jobs running, the number completed etc. It also shows the load average. The job scheduler has two parameters, one is a maximum number of jobs (or unlimited) and the other is a load average. It will only launch new jobs if the load average is less than the given maximum. That allows it to interact reasonably well with builds that use make -j internally. In the example above I set the load average to be just slightly more than the number of CPUs I've got.

It looks to me like it serialises some bits, like installing, since saturating the disk with multiple parallel installs is generally of no benefit, indeed it can be slower. Also downloads seem to be serialised, again because there is probably little benefit to making multiple connections to the same server.

Anyway, the point is, cabal-install ought to be able to do all this. Some bits we can do now. We already have a graph representation of the install plan and we recalculate when a package fails to install.

We will need an improved download api, probably involving sending requests off to a dedicated download thread (which would serialise them).

@bos
Copy link
Contributor Author

bos commented May 24, 2012

(Imported comment by SamAnklesaria on 2009-01-10)

partial, hypothetical implimentation lacking suppressed output and command line flags

@bos
Copy link
Contributor Author

bos commented May 24, 2012

(Imported comment by refold on 2011-03-29)

Relevant mailing list thread: http://thread.gmane.org/gmane.comp.lang.haskell.cabal.devel/7473

@bos
Copy link
Contributor Author

bos commented May 24, 2012

(Imported comment by refold on 2011-06-10)

Current status (for those interested): Building multiple packages in parallel was implemented, but the patches are not merged into the mainline as of yet; I'm now working on parallelising 'cabal build'.

@bos
Copy link
Contributor Author

bos commented May 24, 2012

(Imported comment by refold on 2011-10-16)

Implementation

@bos
Copy link
Contributor Author

bos commented May 24, 2012

(Imported comment by refold on 2011-11-05)

Attached are my patches that parallelise cabal-install's 'install' command.

Sorry for sending them as a single large bundle - ideally I would like
to split the patch series, but darcs send makes it hard by ignoring
depended-upon patches. Additionally, it's hard to destructively edit
history in Darcs, so instead of obliterating two unnecessary patches
(changes to README and cabal-install.cabal) I undid those changes with
a "merge" patch.

The patch series logically consists of three parts (in chronological order):

  1. From the first patch up to the "Parallelise the install command" patch

Implements the basic parallel framework as described here. Changes
are a bit more pervasive than expected because of Cabal's internal
assumption that the current working directory is the same as the directory of the
package currently being built.

  1. From the end of the previous part up to the "Implement output
    serialisation (client bits)." patch

Implements output serialisation - since we don't want the console
output to be garbled, all printing should be done from a single
thread. This is done by changing all code called from
D.C.I.executeInstallPlan to use callbacks instead of standard output
functions (debug/info/...).

  1. Bugfixes and polishing (remaining patches)

During this stage I was concentrated on testing and fixing bugs and
didn't add any new functionality.

My patches are also available in a separate Darcs repository.

@bos
Copy link
Contributor Author

bos commented May 24, 2012

(Imported comment by refold on 2011-11-05)

I've updated my parallel patches (see attachment). Patches apply cleanly to the current mainline. The parallel code path now always uses the external setup method (via Setup.hs), so the required changes to the Cabal lib are minimised. There are still some traces of output serialisation, though.

Some numbers:

$ time cabal install -j 1 alex happy
real    1m19.236s
user    1m1.330s
sys 0m10.510s
$ time cabal install -j 4 alex happy
real    0m52.106s
user    1m10.680s
sys 0m15.030s
$ time cabal install -j 1 yesod
real    19m14.913s
user    15m59.420s
sys 1m25.650s
$ time cabal install -j 4 yesod
real    14m8.599s
user    21m36.530s
sys 4m5.650s
I also tested the Nov 2011 version of the code (tries to use the internal setup method, requires pervasive changes to Cabal lib):
$ time cabal install -j 4 alex happy
real    0m45.503s
user    1m4.040s
sys 0m10.100s
$ time cabal install -j 4 yesod
real    10m41.840s
user    17m6.560s
sys 1m33.040s
Compiling and linking all these Setup.hs files does add some noticeable overhead.

If these patches get accepted, I'll start working on improving the UI.

@bos
Copy link
Contributor Author

bos commented May 24, 2012

(Imported comment by refold on 2012-04-02)

Parallel patches were moved to GitHub:

git clone git://github.com/23Skidoo/cabal.git cabal-parallel-install
cd cabal-parallel-install
git checkout parallel-install

@thielema
Copy link

Is it also planned to build profiling, shared and static libs in parallel?

@23Skidoo
Copy link
Member

@thielema These are currently not built in parallel; I'll look at it after the patches are merged.

@tibbe
Copy link
Member

tibbe commented Jun 26, 2012

This is now mostly done. Great work @23Skidoo ! Remaining is to reduce the output to a much condensed form (as shown in the ticket description) and logging each package's build log to a file that can be output on build failure.

@23Skidoo
Copy link
Member

Now that the patches implementing build logging and better output are in, I think we should close this issue. Improvements to the parallel code (dynamic status indicator, parallel building of shared/profiling/... versions, module-level parallelism) should be dealt with as separate tickets.

@tibbe
Copy link
Member

tibbe commented Jul 12, 2012

@23Skidoo Fine by me. Could you please open a new ticket for the final UI improvements?

@23Skidoo
Copy link
Member

@tibbe Done (#975, #976).

@23Skidoo
Copy link
Member

@tibbe Can you close this ticket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants