Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dirtiness checking triggers unnecessary reinstalls WAS: stack build unnecessarily reinstalls packages from github #838

Closed
ljli opened this issue Aug 23, 2015 · 13 comments
Assignees
Milestone

Comments

@ljli
Copy link
Contributor

ljli commented Aug 23, 2015

When the stack.yaml includes a dependency from github and doing stack build twice, the second command isn't a no-op despite the first command being successful. If doing stack build a third time it is, however. During the second stack build run, stack unregisters the github package saying "local file changes" and reinstalls them again. Diffing the package folder in .stack-work/downloaded before and after the second command reveals that only one file changed: downloaded/<package>/.stack-work/dist/x86_64-osx/Cabal-1.22.4.0/stack-build-cache

I can reproduce this on OS X with the following .cabal and stack.yaml content:
foobar.cabal:

name:                foobar
version:             0.1.0.0
cabal-version:       >=1.10
build-type:          Simple

library
  build-depends:
    base,
    HStringTemplate
  default-language: Haskell2010

stack.yaml:

packages:
- '.'
- location:
    git: https://github.com/factisresearch/HStringTemplate.git
    commit: 033328ce078426733020a2c0d943332f37ffee05
  extra-dep: true

extra-deps: []

resolver: lts-3.0
@borsboom
Copy link
Contributor

This isn't related to the package being a dependency from github. I made a local clone of HStringTemplate and changed the stack.yaml to use

packages:
- '.'
- 'HStringTemplate'

and the same thing happened. The reason it's happening is that, because of the way we detect TemplateHaskell dependent files, it sees them as new files in the second build and so runs the build again. Fixing that would introduce a race condition, and we'd rather have occasional redundant builds than a race condition.

@borsboom
Copy link
Contributor

(it's not obvious where the TH dependency is introduced, but Text/StringTemplate/Instances.hs adds a dependence on autogen/cabal_macros.h -- you can discover that by examining the generated .dump-hi files)

@snoyberg
Copy link
Contributor

Perhaps that file should be special cased and not trigger dirtiness. The
only way it should ever change is when we're reconfiguring.

On Mon, Aug 24, 2015, 12:58 AM Emanuel Borsboom notifications@github.com
wrote:

(it's not obvious where the TH dependency is introduced, but
Text/StringTemplate/Instances.hs adds a dependence on
autogen/cabal_macros.h -- you can discover that by examining the
generated .dump-hi files)


Reply to this email directly or view it on GitHub
#838 (comment)
.

@loisch
Copy link

loisch commented Aug 24, 2015

I am disappointed that you are closing this issue. It's not very reassuring having a build system that claims to be done but on the next call realises that it's it has still work to do. The whole point of having a build system is for tracking dependencies and not having to rebuild everything. I don't really mind how often stack builds stuff internally - but I want it to end in a clean state and correctly detect if a rebuild is needed. If you really need to build packages twice then it should be done in the the one 'stack build' call so that this call can end in a clean state.
We're a company and building our >150.000 LOC Haskell with all it's dependencies is time consuming. We just switched to stack and we love it. I can change my stack.yml or my Cabal file and the right things will be rebuilt. It stop's being fun if I can't trust it to correctly track dependencies. I often call "stack build" just to be sure everything is consistent. With this behavior I start wondering: what the heck changed?
Please consider reopening and fixing this issue - possibly by immediately rebuilding the "dirty" packages.

@borsboom
Copy link
Contributor

In this case, the second build truly is redundant. The first build did build all the files, and along the way it discovered that some TemplateHaskell code called addDependentFile to tell GHC to track that file as a dependency from now on. Now GHC (and stack) know about that file, but they can't reliably know what its contents were at the time it was read by the TH code (there's no rule about the TH having to call addDependentFile before actually reading the file). Therefore, GHC and stack play it safe. For the next build, they assume that the file could possibly have changed since it was actually read by the TH, and so treat it as being dirty. Note this only happens the very first time a TH dependent file is seen, after that everything knows about it and will track its changes.

So for a case like this, we could run another build automatically right after the first, but there wouldn't be much point. It would definitely make the first build take longer for no reason, whereas by being "lazy" there's a good chance something else changed next time you build anyway, so we're avoiding more redundant building than necessary.

Now, if you're changing source files during the build, you of course can't be guaranteed that the build will pick up your changes while it's building, but that's hardly unique to TH dependencies. GHC doesn't go back and check for changed source files during a build and then rebuild them either. I don't know of any build system that does. But our choices here mean the next build will definitely pick up on the changes you made during the first. If you do want it to handle files that change during the build, the --file-watch argument comes close. It will automatically trigger a new build whenever a source file changes. Currently it runs in an infinite loop, but it probably wouldn't be that hard to add an option that makes it stop the first time it didn't notice any files change. Really not sure about the actual utility of that, though.

@borsboom
Copy link
Contributor

Perhaps that file should be special cased and not trigger dirtiness. The
only way it should ever change is when we're reconfiguring.

Is it common for that file to be used from TH code? I don't recall noticing it before, and I haven't actually found where addDependentFile is called (presumably from a dependency package). Or would this special case also apply elsewhere?

In this particular case, the TH also adds a dependency for ~/.stack/programs/x86_64-osx/ghc-7.10.2/lib/ghc-7.10.2/include/ghcversion.h, so would we also special case such files?

@borsboom
Copy link
Contributor

After some more thought: with a multi-package project, doing the second build automatically after discovering a new dependency might have an advantage: by doing that build immediately for the package itself (before building any packages that depend on it), it means the packages that depend on it won't have to be rebuilt a second time as well. That could save a lot of time.

@borsboom borsboom reopened this Aug 24, 2015
@borsboom borsboom added this to the Later improvements milestone Aug 24, 2015
@loisch
Copy link

loisch commented Aug 24, 2015

Thank you @borsboom for the explanation. In our situation many packages have to be rebuilt. As you pointed out these packages would not have to be rebuilt if haskell-src-exts wouldn't have been left in the dirty state directly after its build or if it had been immediately built twice. Would it be possible to add that behavior as an option?
I don't expect stack to be able to reach a clean state if files are changed during the build. From a user perspective I didn't change anything and stack still thinks something has changed. This happens during "normal operation" without any interference from the user or automated tools. It probably wouldn't be a big deal if it would only affect this single package but it affects all packages that depend on haskell-src-exts.
For us this behavior is inconvenient because we're just switching from building directly on our developer workstations to building inside of a docker container which has all dependencies and tools we require installed. The images for these docker containers already contain the result of the "stack build". But because the stack build doesn't end in a clean state, I still have to rebuild if I want to use that docker image. (As a workaround we'll be always calling stack twice, now.)

@snoyberg snoyberg changed the title stack build unnecessarily reinstalls packages from github Dirtiness checking triggers unnecessary reinstalls WAS: stack build unnecessarily reinstalls packages from github Aug 25, 2015
@borsboom borsboom modified the milestones: 0.3.0.0, Later improvements Aug 25, 2015
@borsboom
Copy link
Contributor

While we were brainstorming about this, @snoyberg came up with a great solution:

  • Before starting a package build, record the current system time. Do this after the configure step, so that cabal_macros.h has already been created.
  • After building the package, check for any newly found dependent files.
  • If the new dependent file's timestamp is from before the build started, add it to the cache used to determine when a package's files have changed.

This way the second redundant build will not be needed in the vast majority of cases, but we'll still do the safe thing if a file is changed during the build.

@borsboom borsboom self-assigned this Aug 25, 2015
@borsboom borsboom modified the milestones: 0.2.0.0, 0.3.0.0 Aug 25, 2015
@loisch
Copy link

loisch commented Aug 25, 2015

Sounds great! Thank you very much looking into this issue. I'm already looking forward to the next stack release. :-)

@borsboom
Copy link
Contributor

This should now be fixed in master. Please re-open if you still experience trouble.

@snoyberg
Copy link
Contributor

Yay!

On Sun, Sep 20, 2015, 3:43 AM Emanuel Borsboom notifications@github.com
wrote:

Closed #838 #838.


Reply to this email directly or view it on GitHub
#838 (comment).

@loisch
Copy link

loisch commented Sep 20, 2015

Thank's a lot, @borsboom! It's working like a charm! :-)

dysinger added a commit that referenced this issue Nov 13, 2015
* master: (106 commits)
  Fix ghci -package-id for lib dep from test #1222
  Fix passing of include dirs to GHCI #1222
  Minor change to some local var naming
  Fix a copy-o which breaks ghci test targets #1222
  Only ask for commit count once when compiling
  Move Windows installers above manual download
  Import all modules after loading them (#995)
  replace if then else with case matching (multiway if)
  Control/Concurrent/Execute.hs : redundant do
  Use stack setup so you can ignore most of the output
  Don't cache the GHC folder
  Strip -static before passing to GHCi (#1094)
  Fix: unlisted files in tests and benchmarks trigger extraneous second build (fixes #838)
  Don't fail when registering/looking up library for executable-only packages #1232
  Fix help for default true bool flag's enable hint
  Rebuild when cabal file is changed
  watched command: show files, not directories
  Update changelog
  Fix: Haddocks not copied for dependencies (fixes #1105)
  Remove redundant Version from InstalledMap
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants