`experimental_shell_command`'s sandbox and invalidation includes transitive dependencies of `experimental_shell_command` dependencies, not just outputs #17345

huonw · 2022-10-26T02:02:44Z

Describe the bug

When a experimental_shell_command (:second) depends on another experimental_shell_command (:first), the sandbox for :second include the dependencies of :first, rather than just the output of :first.

https://gist.github.com/huonw/c55d0b387ed6030cda611898ee2d0361 provides a reproducer, where :first generates output.txt by 'consuming' input.txt (but does nothing with it), and :second includes a sleep to demonstrate when it is running. The archive package includes all the .txt files that ended in the sandbox for :second: I'd expect it to only be the direct output of :first (output.txt), not the input.txt dependency of :first.

BUILD file from that gist for convenience:

file(name="input", source="input.txt")
experimental_shell_command(
    name="first",
    command="""
    echo "doesn't affect output"
    echo contents > output.txt
    """,
    tools=["echo"],
    dependencies=[":input"],
    outputs=["output.txt"]
)

experimental_shell_command(
    name="second",
    command="""
    sleep 3 # make 'actually running' obvious
    """,
    dependencies=[":first"],
    tools=["sleep"],
    outputs=["*.txt"],
)
archive(name="archive", files=[":second"], format="zip")

git clone git@gist.github.com:c55d0b387ed6030cda611898ee2d0361.git
cd c55d0b387ed6030cda611898ee2d0361
./pants version # 2.13.0

# initial build (takes about 3 seconds to run :second, due to sleep)
./pants package ::

# check package output:
unzip -l dist/archive.zip # two files: input.txt output.txt

# validating the cache works as expected (~instantaneous)
./pants package ::

# BUG: change to :first dependencies, that doesn't affect output... 
echo 'new contents' > input.txt
# ... reruns :second, and thus takes ~3 seconds
./pants package ::

# EXPECTED: change to :first script that doesn't affect output...
sed -i '' 's/affect output/affect output still/' BUILD
# .... doesn't rerun :second, only :first
./pants package ::

AFAICT, there's no way to break :second's importing of :input, e.g. adjusting to dependencies=[":first", "!:input"] doesn't do anything: the behaviour is the same (and using !!:input isn't supported).

The :input being file is just for convenience. This appears to apply to all other target types, e.g. a pex_binary or yet another experimental_shell_command.

I haven't checked how this behaves when depending on explicit codegen targets, only these adhoc experimental_shell_command ones.

Pants version
2.13.0

OS
macOS

Additional info

Background: how are we using experimental_shell_command to hit this?

We're using experimental_shell_command to try to bridge the gap between Pants supported code and unsupported code (JS/TS), as well as for adhoc codegen tasks (to avoid avoid having to write and maintain a plugin). This can result in 'long' chains of experimental_shell_commands that depend on each other (and other resources), e.g.:

# some app:
python_sources()
pex_binary(name="app", ...)

# codegen the schema itself:
experimental_shell_command(name="schema", dependencies=[":app"], outputs=["schema.json"], command="./app.pex export-schema > schema.json", tools=["python3.9"])

# set up the NPM dependencies:
experimental_shell_command(name="node_modules", outputs=["node_modules/**"], command="npm ci", ...)

# use the schema to do codegen or generate docs or whatever:
experimental_shell_command(name="codegen-from-schema", dependencies=[":node_modules", ":schema"], outputs=["codegen/**"], command="npm run do-codegen < schema.json", ...)
experimental_shell_command(name="docs-from-schema", dependencies=[":node_modules", ":schema"], outputs=["docs/**"], command="npm run generate-docs < schema.json", ...)

# continues with targets using :codegen-from-schema and :docs-from-schema... (`archive`, `experimental_shell_command` and `experimental_run_shell_command`)

The Python code that goes into pex_binary changes regularly and thus the PEX itself changes too, but the exported schema.json doesn't change so much (i.e. we're often refactoring/adding features/fixing bugs without changing the schema). In theory, if :app changes but the schema doesn't, :schema's dependees (:codegen-from-schema, :docs-from-schema) don't need to run, but those dependees pull in the app.pex file from :app, and thus do rerun. Rerunning can take significant time.

(See also: https://pantsbuild.slack.com/archives/C046T6T9U/p1666315386420799)

The text was updated successfully, but these errors were encountered:

benjyw · 2022-10-27T05:29:26Z

Started a discussion here: #17368

thejcannon · 2022-11-07T16:27:49Z

FWIW This is part of my "re-architect dependencies" proposal I'm hoping to work on late this year or Q1 next year 😄

chrisjrn · 2022-11-29T22:45:20Z

#17680 and follow-up work will have some impact on this

chrisjrn · 2022-12-07T20:57:24Z

@huonw Would love your input on #17743 -- I think this should solve your problem somewhat.

…_command` Replaces the `outputs` field, which has a usability issue demonstrated in pantsbuild#17345

…tal_shell_command` (#17744) Replaces the `outputs` field, which has a usability issue demonstrated in #17345. Extraculated from #17743.

As seen in #17345, `experimental_shell_command` has a confused notion of dependencies. There are the "runtime dependencies": those needed to successfully run the command (essentially input data or runnable binaries), and then there are the dependencies that are necessary to use the side-effects generated by the command. `esc` currently treats these identically, which means that the sandboxes for chains of commands get overly large, and their cache invalidations overly fine (due to so many dependencies). This adds a `SpecialCasedDependencies` field to `experimental_shell_command`, called `execution_dependencies`. `execution_dependencies` are used to populate the sandbox, but are not fetched when `TransitiveTargetsRequests` are made by subsequent targets; `dependencies` is now deprecated and replaced with `output_dependencies`, which are the dependencies necessary for successfully consuming the outputs of the `experimental_shell_command`. To make it easier to migrate, if `dependencies` are specified _without_ declaring `execution_dependencies`, a deprecation warning will be raised explaining how to upgrade. Closes #17345

huonw added the bug label Oct 26, 2022

benjyw self-assigned this Oct 26, 2022

huonw mentioned this issue Oct 31, 2022

Running Python code in experimental_shell_command requires building a PEX and specifying an appropriate interpreter as a tool #17405

Closed

chrisjrn self-assigned this Nov 16, 2022

chrisjrn mentioned this issue Nov 22, 2022

experimental_shell_command stabilisation tracking ticket #17616

Closed

chrisjrn mentioned this issue Dec 7, 2022

Adds execution_dependencies for experimental_shell_command #17743

Merged

chrisjrn pushed a commit to chrisjrn/pants that referenced this issue Dec 7, 2022

output_files and output_directories fields to `experimental_shell…

3611b47

…_command` Replaces the `outputs` field, which has a usability issue demonstrated in pantsbuild#17345

chrisjrn mentioned this issue Dec 7, 2022

Renames output_files and output_directories fields for experimental_shell_command #17744

Merged

chrisjrn pushed a commit that referenced this issue Dec 9, 2022

Renames output_files and output_directories fields for `experimen…

ea6863d

…tal_shell_command` (#17744) Replaces the `outputs` field, which has a usability issue demonstrated in #17345. Extraculated from #17743.

chrisjrn closed this as completed in #17743 Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`experimental_shell_command`'s sandbox and invalidation includes transitive dependencies of `experimental_shell_command` dependencies, not just outputs #17345

`experimental_shell_command`'s sandbox and invalidation includes transitive dependencies of `experimental_shell_command` dependencies, not just outputs #17345

huonw commented Oct 26, 2022

benjyw commented Oct 27, 2022

thejcannon commented Nov 7, 2022

chrisjrn commented Nov 29, 2022

chrisjrn commented Dec 7, 2022

experimental_shell_command's sandbox and invalidation includes transitive dependencies of experimental_shell_command dependencies, not just outputs #17345

experimental_shell_command's sandbox and invalidation includes transitive dependencies of experimental_shell_command dependencies, not just outputs #17345

Comments

huonw commented Oct 26, 2022

benjyw commented Oct 27, 2022

thejcannon commented Nov 7, 2022

chrisjrn commented Nov 29, 2022

chrisjrn commented Dec 7, 2022

`experimental_shell_command`'s sandbox and invalidation includes transitive dependencies of `experimental_shell_command` dependencies, not just outputs #17345

`experimental_shell_command`'s sandbox and invalidation includes transitive dependencies of `experimental_shell_command` dependencies, not just outputs #17345