Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digest does not exist when switching repos with pantsd off in one, on in the other #10719

Closed
asherf opened this issue Sep 1, 2020 · 11 comments · Fixed by #10789
Closed

Digest does not exist when switching repos with pantsd off in one, on in the other #10719

asherf opened this issue Sep 1, 2020 · 11 comments · Fixed by #10789
Assignees
Labels
Milestone

Comments

@asherf
Copy link
Member

asherf commented Sep 1, 2020

18:08:49.83 [WARN] <unknown>:882: DeprecationWarning: invalid escape sequence \d
18:08:52.21 [WARN] Completed: Find PEX Python - No bootstrap Python executable could be found from the option `interpreter_search_paths` in the `[python-setup]` scope. Will attempt to run PEXes directly.
18:08:53.11 [WARN] /data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/base/exception_sink.py:359: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  process_title=setproctitle.getproctitle(),

18:08:53.11 [ERROR] 1 Exception encountered:

Engine traceback:
  in select
  in `binary` goal
  in pants.backend.python.rules.create_python_binary.create_python_binary
  in pants.backend.python.rules.pex.two_step_create_pex
  in pants.backend.python.rules.pex.create_pex
Traceback (no traceback):
  <pants native internals>
Exception: String("Digest Digest(Fingerprint<97adba2ad1bfef3ba1b37d6b119e15498ed2a60392241b5ea7c28c602826dd6c>, 97) did not exist in the Store.")
Traceback (most recent call last):
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 255, in run
    engine_result = self._run_v2()
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 166, in _run_v2
    return self._maybe_run_v2_body(goals, poll=False)
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 183, in _maybe_run_v2_body
    return self.graph_session.run_goal_rules(
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/init/engine_initializer.py", line 130, in run_goal_rules
    exit_code = self.scheduler_session.run_goal_rule(
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/engine/internals/scheduler.py", line 561, in run_goal_rule
    self._raise_on_error([t for _, t in throws])
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/engine/internals/scheduler.py", line 520, in _raise_on_error
    raise ExecutionError(
pants.engine.internals.scheduler.ExecutionError: 1 Exception encountered:

Engine traceback:
  in select
  in `binary` goal
  in pants.backend.python.rules.create_python_binary.create_python_binary
  in pants.backend.python.rules.pex.two_step_create_pex
  in pants.backend.python.rules.pex.create_pex
Traceback (no traceback):
  <pants native internals>
Exception: String("Digest Digest(Fingerprint<97adba2ad1bfef3ba1b37d6b119e15498ed2a60392241b5ea7c28c602826dd6c>, 97) did not exist in the Store.")
@asherf
Copy link
Member Author

asherf commented Sep 2, 2020

sometime, I see this extra error info:

  in pants.backend.python.rules.create_python_binary.create_python_binary
  in pants.backend.python.rules.pex.two_step_create_pex
  in pants.backend.python.rules.pex.create_pex
Traceback (no traceback):
  <pants native internals>
Exception: String("Digest Digest(Fingerprint<3dcdb1a3bd62e17cef0250301a85627cdd844605c1e1a514076b429cd39e6870>, 97) did not exist in the Store.")
Fatal Python error: This thread state must be current when releasing
Python runtime state: finalizing (tstate=0x556a5a059f10)

Thread 0x00007f9609246700 (most recent call first):
<no Python frame>

Thread 0x00007f9603fff700 (most recent call first):
<no Python frame>

Current thread 0x00007f9609045700 (most recent call first):
<no Python frame>

Thread 0x00007f9608e44700 (most recent call first):
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 4269 in postParse
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1408 in _parseNoCache
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 3552 in parseImpl
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1402 in _parseNoCache
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 4005 in parseImpl
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1402 in _parseNoCache
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 3417 in parseImpl
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1402 in _parseNoCache
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 3400 in parseImpl
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1402 in _parseNoCache
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 3552 in parseImpl
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1402 in _parseNoCache
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 3417 in parseImpl
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1402 in _parseNoCache
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/pyparsing.py", line 1644 in parseString
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/requirements.py", line 98 in __init__
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3119 in __init__
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3109 in parse_requirements
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/backend/python/rules/pex_from_targets.py", line 209 in pex_from_targets
  File "/data/home/asher/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0a1_py38/lib/python3.8/site-packages/pants/engine/internals/native.py", line 67 in generator_send

Thread 0x00007f9611881080 (most recent call first):
<no Python frame>
Aborted (core dumped)

@stuhood @gshuflin

@stuhood stuhood added the bug label Sep 4, 2020
@stuhood stuhood added this to the 2.0.x milestone Sep 4, 2020
@stuhood
Copy link
Member

stuhood commented Sep 4, 2020

. @asherf mentioned that this occurs on a machine where some repositories are using pantsd and some aren't. That should be accounted for in the lease extension code, but it might still have a gap. Determining whether the missing digest is an "inner" node (below a "root" digest that was kept alive) would likely be interesting.

Added to the 2.0.x milestone.

@Eric-Arellano
Copy link
Contributor

I got this when running in the TC codebase (no pantsd) after just having had run in Pants's codebase (with pantsd):

Engine traceback:
  in select
  in `typecheck` goal
  in Typecheck using MyPy
  in pants.backend.python.util_rules.pex.create_pex
  in pants.backend.python.util_rules.pex_cli.setup_pex_cli_process
  in Find PEX Python
  in Find binary path
  in pants.engine.process.remove_platform_information
Traceback (no traceback):
  <pants native internals>
Exception: Bytes from stdout Digest Digest(Fingerprint<55fd2022c440089e0812a6a9dc5affeba6ba3d30715595dfde9092125d93909b>, 93) not found in store

@stuhood stuhood modified the milestones: 2.0.x, 2.0.0rc0 Sep 9, 2020
@Eric-Arellano Eric-Arellano changed the title Digest did not exist error when running binary goal in multiple targets (~50) Digest does not exist when switching repos with pantsd off in one, on in the other Sep 9, 2020
@stuhood stuhood assigned stuhood and unassigned gshuflin Sep 15, 2020
stuhood added a commit that referenced this issue Sep 16, 2020
…ng (#10789)

### Problem

#10719 likely describes two different variants of "we hit the local process cache, but then failed to actually use the result because it had been garbage collected". In one of the cases it is crystal clear that the result was collected, because it is the stdout of the process that is missing. In the other case, the failure occurs while attempting to merge directories produced by process runs.

### Solution

When hitting the local process cache, ensure that all of the process outputs exist (and as a sideffect, that they are downloaded locally if a remote cache is configured). Added and fixed a test for this case.

An alternative implementation would have been to guarantee that a cache entry must exist only if all of the digests it requires are transitively reachable. But the local cache and the filesystem store use two different LMDB stores, which means that we cannot transactionally update them in a way that would rule out a cache entry existing even though its file content had been garbage collected... and it's not clear that merging those stores is desirable.

### Result

Fixes #10719. In addition to the test, I lowered the lease time and garbage collection times and validated that the case described on #10719 is no longer reproducible.
@asherf
Copy link
Member Author

asherf commented Sep 18, 2020

Still happens on latest master (in the pants repo).

410 files would be left unchanged.

17:37:11.01 [INFO] Completed: Lint with Flake8 - succeeded.

✓ Black succeeded.
✓ Docformatter succeeded.
✓ Flake8 succeeded.
✓ isort succeeded.
17:37:12.29 [ERROR] 1 Exception encountered:

Engine traceback:
  in select
  in `test` goal
  in Run tests (src/python/pants/backend/python/lint/python_fmt_integration_test.py:integration)
  in Run Pytest (src/python/pants/backend/python/lint/python_fmt_integration_test.py:integration)
  in pants.backend.python.goals.pytest_runner.setup_pytest_for_target
Traceback (no traceback):
  <pants native internals>
Exception: String("Digest Digest(Fingerprint<7c6268635e6a89f6dea22a9e9e93559dde02743de41224168b679ae3290e6a58>, 85) did not exist in the Store.")
Traceback (most recent call last):
  File "/data/home/asher/projects/pants/src/python/pants/bin/local_pants_runner.py", line 255, in run
    engine_result = self._run_v2()
  File "/data/home/asher/projects/pants/src/python/pants/bin/local_pants_runner.py", line 166, in _run_v2
    return self._maybe_run_v2_body(goals, poll=False)
  File "/data/home/asher/projects/pants/src/python/pants/bin/local_pants_runner.py", line 189, in _maybe_run_v2_body
    poll_delay=(0.1 if poll else None),
  File "/data/home/asher/projects/pants/src/python/pants/init/engine_initializer.py", line 139, in run_goal_rules
    goal_product, params, poll=poll, poll_delay=poll_delay
  File "/data/home/asher/projects/pants/src/python/pants/engine/internals/scheduler.py", line 561, in run_goal_rule
    self._raise_on_error([t for _, t in throws])
  File "/data/home/asher/projects/pants/src/python/pants/engine/internals/scheduler.py", line 525, in _raise_on_error
    wrapped_exceptions=tuple(t.exc for t in throws),
pants.engine.internals.scheduler.ExecutionError: 1 Exception encountered:

Engine traceback:
  in select
  in `test` goal
  in Run tests (src/python/pants/backend/python/lint/python_fmt_integration_test.py:integration)
  in Run Pytest (src/python/pants/backend/python/lint/python_fmt_integration_test.py:integration)
  in pants.backend.python.goals.pytest_runner.setup_pytest_for_target
Traceback (no traceback):
  <pants native internals>
Exception: String("Digest Digest(Fingerprint<7c6268635e6a89f6dea22a9e9e93559dde02743de41224168b679ae3290e6a58>, 85) did not exist in the Store.")
asher@ip-10-1-201-229 ~/projects/pants (master)$

@stuhood

@asherf asherf reopened this Sep 18, 2020
@Eric-Arellano
Copy link
Contributor

2.0.0b0 does not include #10789, meaning that the Toolchain codebase is still "wrong", even if Pants's is correct. Is this expected? Do both need to have the fix?

@asherf
Copy link
Member Author

asherf commented Sep 18, 2020

this is on latest master in the pants repo.

@asherf
Copy link
Member Author

asherf commented Sep 18, 2020

Screen Shot 2020-09-18 at 10 42 31 AM

Screen Shot 2020-09-18 at 10 42 38 AM

@Eric-Arellano
Copy link
Contributor

Yes, but the hypothesis for this issue is that it's when you have two repos, and one uses Pantsd and the other does not. My point is that you're using the right version of the pantsbuild/pants repo, but Toolchain is still "broken", so it might be expected for this to still be an issue.

It will be more informative once we do the beta1 release. If things are still broken once Toolchain upgrades to beta1, then #10798 did not fix things.

@stuhood
Copy link
Member

stuhood commented Sep 18, 2020

It will be more informative once we do the beta1 release. If things are still broken once Toolchain upgrades to beta1, then #10798 did not fix things.

Right. The consuming ("read side") repo needs to be consuming the patch. I expect that we should re-close this unless we observe it after TC is actually using the patch.

@stuhood
Copy link
Member

stuhood commented Sep 24, 2020

I'm going to re-close this until/unless we observe it in 2.0.0b1.

@stuhood stuhood closed this as completed Sep 24, 2020
stuhood added a commit to stuhood/pants that referenced this issue Sep 29, 2020
…ng (pantsbuild#10789)

When hitting the local process cache, ensure that all of the process outputs exist (and as a sideffect, that they are downloaded locally if a remote cache is configured). Added and fixed a test for this case.

An alternative implementation would have been to guarantee that a cache entry must exist only if all of the digests it requires are transitively reachable. But the local cache and the filesystem store use two different LMDB stores, which means that we cannot transactionally update them in a way that would rule out a cache entry existing even though its file content had been garbage collected... and it's not clear that merging those stores is desirable.

Fixes pantsbuild#10719. In addition to the test, I lowered the lease time and garbage collection times and validated that the case described on pantsbuild#10719 is no longer reproducible.
@stuhood
Copy link
Member

stuhood commented Sep 29, 2020

Cherrypicking to 1.30.x via #10879.

stuhood added a commit that referenced this issue Sep 30, 2020
…ng (cherrypick of #10789) (#10879)

### Problem

#10719 likely describes two different variants of "we hit the local process cache, but then failed to actually use the result because it had been garbage collected". In one of the cases it is crystal clear that the result was collected, because it is the stdout of the process that is missing. In the other case, the failure occurs while attempting to merge directories produced by process runs.

### Solution

When hitting the local process cache, ensure that all of the process outputs exist (and as a sideffect, that they are downloaded locally if a remote cache is configured). Added and fixed a test for this case.

An alternative implementation would have been to guarantee that a cache entry must exist only if all of the digests it requires are transitively reachable. But the local cache and the filesystem store use two different LMDB stores, which means that we cannot transactionally update them in a way that would rule out a cache entry existing even though its file content had been garbage collected... and it's not clear that merging those stores is desirable.

### Result

Fixes #10719. In addition to the test, I lowered the lease time and garbage collection times and validated that the case described on #10719 is no longer reproducible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants