Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro: check for hash mismatch between deleted dependencies and upstream outputs. #9533

Merged

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Jun 2, 2023

Only if --allow-missing is passed.

  • Create tests/func/repro and extract pull and allow_missing to separate test files.

Closes #9530

@daavoo daavoo self-assigned this Jun 2, 2023
@daavoo daavoo linked an issue Jun 2, 2023 that may be closed by this pull request
@daavoo daavoo requested review from dberenbaum and a team June 2, 2023 15:03
Comment on lines +317 to +320
if upstream and any(
dep.fs_path == out.fs_path and dep.hash_info != out.hash_info
for stage in upstream
for out in stage.outs
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not pretty but I don't see any alternative

@daavoo daavoo added bugfix fixes bug A: pipelines Related to the pipelines feature labels Jun 2, 2023
@codecov
Copy link

codecov bot commented Jun 2, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 🎉

Comparison is base (5c1487e) 90.55% compared to head (03869f4) 90.57%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9533      +/-   ##
==========================================
+ Coverage   90.55%   90.57%   +0.01%     
==========================================
  Files         470      472       +2     
  Lines       35939    36013      +74     
  Branches     5180     5182       +2     
==========================================
+ Hits        32545    32619      +74     
  Misses       2800     2800              
  Partials      594      594              
Impacted Files Coverage Δ
tests/func/repro/test_repro.py 100.00% <ø> (ø)
dvc/repo/reproduce.py 95.49% <100.00%> (+0.04%) ⬆️
dvc/stage/__init__.py 91.66% <100.00%> (+0.03%) ⬆️
tests/func/repro/test_repro_allow_missing.py 100.00% <100.00%> (ø)
tests/func/repro/test_repro_pull.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dberenbaum
Copy link
Collaborator

This works, so not a blocker, but is there a way we can actually have dvc status --allow-missing return modified here like discussed in the issue?

It sounds like something I would expect to be possible, especially with the index changes from @efiop that I thought should make it easy to compare against virtual outputs.

@daavoo
Copy link
Contributor Author

daavoo commented Jun 5, 2023

This works, so not a blocker, but is there a way we can actually have dvc status --allow-missing return modified here like discussed in the issue?

I think this P.R. should be merged and backported whereas the status change still needs discussion and would be a 3.0 change.

especially with the index changes from @efiop that I thought should make it easy to compare against virtual outputs.

I am not sure I follow how the index changes have anything to do with this, as this just involves checking upstream stages in the pipeline

@dberenbaum
Copy link
Collaborator

I think this P.R. should be merged and backported whereas the status change still needs discussion and would be a 3.0 change.

Okay, I'm fine to do it this way.

I am not sure I follow how the index changes have anything to do with this, as this just involves checking upstream stages in the pipeline

I wasn't that familiar with how status works and thought it would be possible. In theory, we could build a virtual index based on the outputs of the dvc.lock and the .dvc files and do a status check against that rather than against the workspace. From a quick look, it looks like it would basically do the same as this method except not against the workspace (which I don't see an easy way to do):

dvc/dvc/output.py

Lines 590 to 600 in 3e0f2ba

def workspace_status(self) -> Dict[str, str]:
if not self.exists:
return {str(self): "deleted"}
if self.changed_checksum():
return {str(self): "modified"}
if not self.hash_info:
return {str(self): "new"}
return {}

It would return a modified status when an upstream stage changed (since the hash of the out will change) without manually comparing upstream stages and changing the status inside repro.

@efiop
Copy link
Contributor

efiop commented Jun 7, 2023

Maybe we can assemble an index from deps and one from outs and then do an index diff?

@dberenbaum
Copy link
Collaborator

@daavoo Do you plan to merge as is? If so, can we get someone to take a look so we can unblock it please?

@daavoo
Copy link
Contributor Author

daavoo commented Jun 12, 2023

@daavoo Do you plan to merge as is? If so, can we get someone to take a look so we can unblock it please?

I did not plan to implement a different solution for the release if this one works. cc @iterative/dvc could you review this?

@efiop efiop changed the title repro: Check for hash mismatch bewteen deleted dependencies and upstream outputs. repro: check for hash mismatch between deleted dependencies and upstream outputs. Jun 13, 2023
…eam outputs.

Only if `--allow-missing` is passed.

- Create `tests/func/repro` and extract `pull` and `allow_missing` to separate test files.

Closes #9530
@daavoo daavoo force-pushed the 9530-allow-missing-skips-stages-even-if-deps-have-changed branch from d15e48c to 03869f4 Compare June 13, 2023 12:12
@daavoo daavoo enabled auto-merge (rebase) June 13, 2023 12:17
@skshetry skshetry disabled auto-merge June 13, 2023 12:38
@efiop efiop merged commit d0aa1ce into main Jun 13, 2023
@efiop efiop deleted the 9530-allow-missing-skips-stages-even-if-deps-have-changed branch June 13, 2023 12:46
@efiop
Copy link
Contributor

efiop commented Jun 13, 2023

@skshetry I see you've disabled auto-merge, was that intentional? Sorry, clicked and only then noticed.

@skshetry
Copy link
Member

I was trying to merge, but noticed that some tests were failing. I have rerun them, waiting for them to be green. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: pipelines Related to the pipelines feature bugfix fixes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--allow-missing: skips stages even if deps have changed
4 participants