Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecRunner's getWorkingDir should not be cached #40

Closed
jfbibeau opened this issue Sep 30, 2019 · 12 comments
Closed

ExecRunner's getWorkingDir should not be cached #40

jfbibeau opened this issue Sep 30, 2019 · 12 comments

Comments

@jfbibeau
Copy link

(Apologies for not submitting a PR, my company doesn't allow me to do open source contribution easily yet)

Looks like this PR https://github.com/node-gradle/gradle-node-plugin/pull/32/files introduced a small bug:

On this line: https://github.com/node-gradle/gradle-node-plugin/blob/master/src/main/groovy/com/moowork/gradle/node/exec/ExecRunner.groovy#L35

The working directory should not be a caching input. Currently the way this is set up, it will take into account the contents of the working directory, and also because the PathSensitive annotation is not set, it means the entry will not be cached correctly on different machines/different checkout directories.

Therefore I believe workingDir should not be taken into account in the cache, and be annotated simply with @Internal:

    @Internal
    File getWorkingDir()
    {
        return workingDir
    }

Thanks for the consideration!

@bsautel
Copy link
Contributor

bsautel commented Oct 1, 2019

Thanks @jfbibeau for this feedback.

The current configuration does not take into consideration the contents of the working directory, it only cares about its path. I just come check that by reproducing this use case in an integration test (not committed).

We would have the behavior you describe if we used the @InputDirectory annotation. The @Input annotation makes Gradle only care about the value (path) and not the contents. Note that if I replace the @Input annotation by @InputDirectory, it breaks some integration tests because the contents of the working directory changes between two task runs.

Note that Gradle prints a warning when building the plugin every time there is a File annotated with @Input, for instance:

  • Warning: Type 'com.moowork.gradle.node.yarn.YarnSetupTask': property 'runner.workingDir' has @input annotation used on property of type java.io.File.

The documentation seems to tell that it is valid and it is exactly what we need:

This will cause the task to be considered out-of-date when the property has changed. When used on a java.io.File object that refers to a file or directory, the up-to-date check is only dependent on the path and not the contents of the file or directory. To make it depend on the contents, use InputFile or InputDirectory respectively.

Don't know really why Gradle prints this warning.

The documentation also tells that by default an @Input file has the ABSOLUTE path sensitivity, which seems to be appropriate in our case.

Is there something I misunderstood in Gradle's behavior or in your issue report?

You seem to think that considering the working directory as a task input is not relevant, I believe I don't agree with that. Do we have to run a task again when its working directory change? I would answer yes because it can completely change the result. What do you think about that?

@jfbibeau
Copy link
Author

jfbibeau commented Oct 1, 2019

Hi @bsautel, thanks for your reply!

I've been debating back and forth as to whether the working directory path matters. In my opinion, specifically given that these are yarn commands, the working directory should not matter as yarn commands will generally run from the context of the yarn binary, or at least the args (which are part of caching input) will capture variations. Specifically workingDir is really just an argument that allows yarn to work - yarn will either work, or not work, depending on workingDir. I can't see any examples where you'd be setting workingDir differently between different yarn invocations - it should be where your package.json is, where node_modules are, etc..

Even more importantly, when you consider that Gradle caching artifacts can be re-used between different developers (using Gradle Enterprise), the path should almost never be ABSOLUTE.

For instance a developer cloning and building in /Users/userA/clone/gradlew ... and a developer cloning and building in /Users/userB/clone/gradlew... should be able to share cache artifacts. This would definitely need to be set to RELATIVE to be able to leverage this, there is almost no reason to set it to ABSOLUTE (and it will indeed generate warnings in Gradle 6).

I've forked the plugin locally doing these changes and am able to confirm that it resolves all caching issues. I'd love to keep using upstream and not a forked version so these would be my suggestions.

I'm open to any further thoughts you have!

@bsautel
Copy link
Contributor

bsautel commented Oct 2, 2019

Thanks for these explanations!

It sounds like npm and yarn have the same behavior regarding the working directory: whatever you set they override it to the directory that contains the package.json file. In this case, you are right, overriding it does not really make sense (though it is can be read in the INIT_CWD). But this is not true with the node command (NodeTask). In this case I still think that changing the working directory can really change the output.

Regarding the path sensitivity, I did not know that the ABSOLUTE one can break caching. I do not use Gradle caching yet so I did not already face this issue.

When you say that your changes fixed this issue, what exactly did you change? You set the path sensitivity to RELATIVE or you removed the @Input annotation?

Note that I tried to set the path sensitivity to relative and execute tests. They all pass but I have yet another warning that I still don't understand:

  • Warning: Type 'com.moowork.gradle.node.yarn.YarnTask': property 'execRunner.workingDir' is annotated with @PathSensitive that is not allowed for @input properties.

It sounds like we do what Gradle suggests in the documentation and it displays some warnings. I probably misunderstood something. Maybe you know why?

Sorry for all my questions but I would like to ensure I understand correctly the issue to fix it correctly for the different plugin tasks (removing the working directory from the npm and yarn tasks input is probably relevant, but not sure it is also the case for the NodeTask).

And thanks, your feedback makes me learn some new thinks about Gradle!

@jfbibeau
Copy link
Author

jfbibeau commented Oct 2, 2019

No problem, happy to help!

Yes, @Input should really never be used on a file, and PathSensitive never makes sense for @Input. These are all signs that this is not really the right thing to do :)

I really believe that setting the workingDir to @Internal is the right way to go. Let me explain why. I can see where there may be corner case where NodeTask is concerned, although I would argue the build is not properly written if the workingDir path needs to be cached. As a quick and dirty example, imagine you have 2 tasks in which workingDir:

tasks.register('myTaskA', NodeTask) {
   args = ['myTask']
   workingDir file('/path/to/a')
}
tasks.register('myTaskB', NodeTask) {
   args = ['myTask']
   workingDir file('/path/to/b')
}

Here I would argue that yes, the only thing differentiating these tasks from the inputs declared in NodeTask is the workingDir. However, the proper way of doing this in Gradle is to add the directory as InputDirectory, because the task's caching artifact is only safe to re-use if the contents of the folder are the same. You can then see here that the workingDir is not the truly relevant part, but the actual contents of the directory that the task operates on. And there is no way to know that without knowing the implementation detail of the command.

Do remember that any task can add inputs on the fly, so, the proper way to write this would be:

tasks.register('myTask', NodeTask) {
   args = ['myTask']
   // set workingDir if needed, but it doesn't really matter, as long as `node` works
   inputs.dir file('/path/to/a')
   outputs.dir file('build/myOutputs')
}

So in conclusion (sorry that was a bit long):

  • The workingDir is never an appropriate input for ExecRunner, either as @Input or as @InputDirectory
  • It should be changed to @Internal
  • Users are free to determine the inputs of NodeTask (or YarnTask, etc..) as it will be entirely specific to what the task actually does. Those inputs are entirely dependent on what the command does and should not be a concern of this plugin :)

I hope this helps, if not do let me know where I can clarify!

@jfbibeau jfbibeau changed the title Missing PathSensitivity annotation breaks caching ExecRunner's getWorkingDir should not be cached Oct 2, 2019
@bsautel
Copy link
Contributor

bsautel commented Oct 3, 2019

Ok, thanks for these explanations.

I had a look to the Gradle Exec task code and it works as you say, the working directory is ignored. Note that there is a comment that says it should be a content-less @InputDirectory, the kind of thing I was trying to do.

I am ok for integrating these changes.

But I wonder what we should do with the other properties of the ExecRunner. Do you think ignoreExitValue and environment should be part of the task inputs? We should probably do the same thing as Gradle did for the Exec task. They consider the ignoreExitValue property as an input but ignore the environment. On our side, for now, we also consider the ignoreExitValue and also the overridden environment variables as task inputs.

As far as I understand, the Gradle philosophy (and you seem to agree with that) is to consider that if an environment or a working directory change can change the task behavior, the user is responsible of telling it to Gradle by manually adding the value to the task input properties, right?

I'll be on vacations until next monday so I will not be able to do the changes before. Changing the annotations is fast but it requires to change all the integration tests that are going to be broken.

@jfbibeau
Copy link
Author

jfbibeau commented Oct 3, 2019

Thanks @bsautel, I think we are in agreement! I look forward to deleting my fork and syncing back to using upstream!

I agree with your assessment and with Gradle's design:

  • ignoreExitValue should indeed be taken into consideration, as it can fail/pass the task, so it should be considered as caching input.
  • environment should not, as it should not affect the output of the task, therefore it doesn't make sense as a caching input. For example, if I compile JS on mac or linux, it should not matter, I should be able to re-use the outputs of compilation, so environment should not be an input. If this is not the case, for example building native binaries, then the build should declare so and add it as an input manually.

I appreciate your time responding to these issues!

@deepy
Copy link
Member

deepy commented Oct 4, 2019

We're probably still gonna need parts of environment in npm things like NPM_CONFIG_OPTIONAL, NPM_CONFIG_PRODUCTION, NPM_CONFIG_SHRINKWRAP feels like they should have an impact on the cache.

@jfbibeau
Copy link
Author

jfbibeau commented Oct 4, 2019

If you'd like to keep environment as a caching input for now, I'm ok with it, I haven't yet found any caching bugs caused by having it as an input.

@bsautel
Copy link
Contributor

bsautel commented Oct 7, 2019

I just come to fix the issue regarding the working directory. I did not change the input declaration regarding the environment.

@bsautel
Copy link
Contributor

bsautel commented Oct 9, 2019

I have a question for you @jfbibeau.

It remains one property in the ExecRunner that is neither declared as an input nor declared as internal: execOverrides.

This is a closure that, if set, can configure (and override since it is called at the end of the configuration) the Gradle exec action configuration.

It can be used to configure some properties that are supported directly in ExecRunner (such as the working directory or the environment) but other properties that cannot be directly configured in ExecRunner such as the standard output to use.

According to what it does, it may or may not change the task behavior. According to what you said and what Gradle does, I understand that it does not really make sense to consider it as a task input. Do you confirm that?

Note that I tried to declare it as a task input and it is not simple. As it is a closure, Gradle executes it when it checks whether it is up-to-date or not (it seems to detect changes on its result). This closure assumes it is called with a parameter (the exec action) and Gradle provides no parameters, so its execution does not work. As far as I understand, to get this work, we should use an Action and not a closure, in this case Gradle is able to detect changes on its behavior instead of on its result. This also makes me think that is way easier to declare it as internal.

@jfbibeau
Copy link
Author

jfbibeau commented Oct 9, 2019

@bsautel agreed!

bsautel added a commit that referenced this issue Oct 9, 2019
…t it is not part of the up-to-date checking. Issue #40.
@bsautel
Copy link
Contributor

bsautel commented Oct 9, 2019

Ok, that's done. This way we no longer have any warning regarding task input/output declarations when building the plugin. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants