Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some love for update #2859

Merged
merged 13 commits into from
Sep 24, 2018
Merged

Some love for update #2859

merged 13 commits into from
Sep 24, 2018

Conversation

mih
Copy link
Member

@mih mih commented Sep 21, 2018

Fixes #2858, and makes the state of update and particularly its tests a bit nicer.

@codecov
Copy link

codecov bot commented Sep 21, 2018

Codecov Report

Merging #2859 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2859      +/-   ##
==========================================
+ Coverage   90.19%   90.26%   +0.07%     
==========================================
  Files         246      246              
  Lines       31786    31828      +42     
==========================================
+ Hits        28668    28731      +63     
+ Misses       3118     3097      -21
Impacted Files Coverage Δ
datalad/distribution/tests/test_update.py 100% <100%> (ø) ⬆️
datalad/distribution/update.py 98.13% <100%> (+20%) ⬆️
datalad/support/annexrepo.py 88.15% <0%> (ø) ⬆️
datalad/support/gitrepo.py 88.57% <0%> (+0.09%) ⬆️
datalad/support/repo.py 91.3% <0%> (+3.8%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3432419...aba7afc. Read the comment docs.

@mih
Copy link
Member Author

mih commented Sep 21, 2018

@yarikoptic @bpoldrack @kyleam I would appreciate some input on how to deal with this situation:

It happens after a tiny modification of datalad/distribution/tests/test_update.py:test_update_volatile_subds

The starting point is a repo with a submodule, and the origin repo of the submodule has been removed from the origin repo. Now we are in the root of the repo clone that still has the submodule

# submodule known
mih@meiner ...temp_test_update_volatile_subdsEcqfyR (git)-[master] % git submodule
 c948beb9ec443add41ac4e26f54d30dc84ee41af subm 1 (heads/master)
# submodule has origin
mih@meiner ...temp_test_update_volatile_subdsEcqfyR (git)-[master] % git -C subm\ 1 remote -v
origin  /tmp/datalad_temp_test_update_volatile_subdsiLnX25/subm 1 (fetch)
origin  /tmp/datalad_temp_test_update_volatile_subdsiLnX25/subm 1 (push)
# but origin of the submodule is gone
mih@meiner ...temp_test_update_volatile_subdsEcqfyR (git)-[master] % ls /tmp/datalad_temp_test_update_volatile_subdsiLnX25/subm\ 1 
ls: cannot access '/tmp/datalad_temp_test_update_volatile_subdsiLnX25/subm 1': No such file or directory

# and now the bummer!
mih@meiner ...temp_test_update_volatile_subdsEcqfyR (git)-[master] % git fetch
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 5 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (5/5), done.
From /tmp/datalad_temp_test_update_volatile_subdsiLnX25
   d1f26da..ef6ee29  master     -> origin/master
Fetching submodule subm 1
fatal: '/tmp/datalad_temp_test_update_volatile_subdsiLnX25/subm 1' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
1 mih@meiner ...temp_test_update_volatile_subdsEcqfyR (git)-[master] % git fetch

So a plain git fetch fails, but only the first time, the second attempt is just fine. I am not even sure, why it would touch the submodule in the clone at all!? I also cannot see a difference in the repos before and after the fetch.

Moreover, if I do further changes to the origin repo, and fetch those, the problem doesn't reappear -- all is splendid.

Q is now how to deal with this situation? Going through a hierarchy of subdataset and disabling remotes just because they did not response at the time of a fetch is not really an option, but running things twice when they fail also doesn't sound good.

Any input is much appreciated!

@kyleam
Copy link
Contributor

kyleam commented Sep 21, 2018

So a plain git fetch fails, but only the first time, the second attempt is just fine. I am not even sure, why it would touch the submodule in the clone at all!?

Explicitly disabling submodule recursion seems to fix it on my end.

diff --git a/datalad/distribution/update.py b/datalad/distribution/update.py
index 0f6eddab8..8abca67d3 100644
--- a/datalad/distribution/update.py
+++ b/datalad/distribution/update.py
@@ -168,6 +168,7 @@ def __call__(
                 # test against user-provided value!
                 remote=None if sibling is None else sibling_,
                 all_=sibling is None,
+                recurse_submodules="no",
                 prune=True)  # prune to not accumulate a mess over time
             repo.fetch(**fetch_kwargs)
             # NOTE if any further acces to `repo` is needed, reevaluate

@mih
Copy link
Member Author

mih commented Sep 22, 2018

Wonderful, thx!

@mih
Copy link
Member Author

mih commented Sep 23, 2018

Any further remarks?

@mih
Copy link
Member Author

mih commented Sep 24, 2018

Alrighty...

@mih mih merged commit bfaf586 into datalad:master Sep 24, 2018
@mih mih deleted the bf-update branch September 24, 2018 05:01
@yarikoptic yarikoptic modified the milestone: Release 0.10.4 Oct 22, 2018
yarikoptic added a commit that referenced this pull request Oct 24, 2018
 Upgrade of [git-annex] to the most recent available to your release is
 advisable since a number of issues were resolved at that level.

 ### Major refactoring and deprecations

 - `datalad.consts.LOCAL_CENTRAL_PATH` constant was deprecated in favor
   of `datalad.locations.default-dataset` [configuration] variable
   ([#2835])

 ### Minor refactoring

 - `"notneeded"` messages are no longer reported by default results
   renderer
 - [run] no longer shows commit instructions upon command failure when
   `explicit` is true and no outputs are specified ([#2922])
 - `get_git_dir` moved into GitRepo ([#2886])
 - `_gitpy_custom_call` removed from GitRepo ([#2894])
 - `GitRepo.get_merge_base` argument is now called `commitishes` instead
   of `treeishes` ([#2903])

 ### Fixes

 - [update] should not leave the dataset in non-clean state ([#2858])
   and some other enhancements ([#2859])
 - Fixed chunking of the long command lines to account for decorators
   and other arguments ([#2864])
 - Progress bar should not crash the process on some missing progress
   information ([#2891])
 - Default value for `jobs` set to be `"auto"` (not `None`) to take
   advantage of possible parallel get if in `-g` mode ([#2861])
 - [wtf] must not crash if `git-annex` is not installed etc ([#2865]),
   ([#2865]), ([#2918]), ([#2917])
 - Fixed paths (with spaces etc) handling while reporting annex error
   output ([#2892]), ([#2893])
 - `__del__` should not access `.repo` but `._repo` to avoid attempts
   for reinstantiation etc ([#2901])
 - Fix up submodule `.git` right in `GitRepo.add_submodule` to avoid
   added submodules being non git-annex friendly ([#2909]), ([#2904])
 - [run-procedure] ([#2905])
   - now will provide dataset into the procedure if called within dataset
   - will not crash if procedure is an executable without `.py` or `.sh`
     suffixes
 - Use centralized `.gitattributes` handling while setting annex backend
   ([#2912])
 - `GlobbedPaths.expand(..., full=True)` incorrectly returned relative
    paths when called more than once ([#2921])

 ### Enhancements and new features

 - Report progress on [clone] when installing from "smart" git servers
   ([#2876])
 - Stale/unused `sth_like_file_has_content` was removed ([#2860])
 - Enhancements to [search] to operate on "improved" metadata layouts
   ([#2878])
 - Output of `git annex init` operation is now logged ([#2881])
 - New
   - `GitRepo.cherry_pick` ([#2900])
   - `GitRepo.format_commit` ([#2902])
 - [run-procedure] ([#2905])
   - procedures can now recursively be discovered in subdatasets as well.
     The uppermost has highest priority
   - Procedures in user and system locations now take precedence over
     those in datasets.

* tag '0.11.0':
  Make it a 0.11.0 release since there were some API RFings
  REL: slight tune up to Changelog following the advices
  DOC: v0.10.4: Mention change in procedure precedence (a0cbcba)
  DOC: v0.10.4: Fix description of db715b7
  DOC: v0.10.4: Improve description of 6f615a4
  DOC: v0.10.4: Remove duplicate word
yarikoptic added a commit that referenced this pull request Oct 24, 2018
 ## 0.11.0 (Oct 23, 2018) -- Soon-to-be-perfect

 [git-annex] 6.20180913 (or later) is now required - provides a number of
 fixes for v6 mode operations etc.

 ### Major refactoring and deprecations

 - `datalad.consts.LOCAL_CENTRAL_PATH` constant was deprecated in favor
   of `datalad.locations.default-dataset` [configuration] variable
   ([#2835])

 ### Minor refactoring

 - `"notneeded"` messages are no longer reported by default results
   renderer
 - [run] no longer shows commit instructions upon command failure when
   `explicit` is true and no outputs are specified ([#2922])
 - `get_git_dir` moved into GitRepo ([#2886])
 - `_gitpy_custom_call` removed from GitRepo ([#2894])
 - `GitRepo.get_merge_base` argument is now called `commitishes` instead
   of `treeishes` ([#2903])

 ### Fixes

 - [update] should not leave the dataset in non-clean state ([#2858])
   and some other enhancements ([#2859])
 - Fixed chunking of the long command lines to account for decorators
   and other arguments ([#2864])
 - Progress bar should not crash the process on some missing progress
   information ([#2891])
 - Default value for `jobs` set to be `"auto"` (not `None`) to take
   advantage of possible parallel get if in `-g` mode ([#2861])
 - [wtf] must not crash if `git-annex` is not installed etc ([#2865]),
   ([#2865]), ([#2918]), ([#2917])
 - Fixed paths (with spaces etc) handling while reporting annex error
   output ([#2892]), ([#2893])
 - `__del__` should not access `.repo` but `._repo` to avoid attempts
   for reinstantiation etc ([#2901])
 - Fix up submodule `.git` right in `GitRepo.add_submodule` to avoid
   added submodules being non git-annex friendly ([#2909]), ([#2904])
 - [run-procedure] ([#2905])
   - now will provide dataset into the procedure if called within dataset
   - will not crash if procedure is an executable without `.py` or `.sh`
     suffixes
 - Use centralized `.gitattributes` handling while setting annex backend
   ([#2912])
 - `GlobbedPaths.expand(..., full=True)` incorrectly returned relative
    paths when called more than once ([#2921])

 ### Enhancements and new features

 - Report progress on [clone] when installing from "smart" git servers
   ([#2876])
 - Stale/unused `sth_like_file_has_content` was removed ([#2860])
 - Enhancements to [search] to operate on "improved" metadata layouts
   ([#2878])
 - Output of `git annex init` operation is now logged ([#2881])
 - New
   - `GitRepo.cherry_pick` ([#2900])
   - `GitRepo.format_commit` ([#2902])
 - [run-procedure] ([#2905])
   - procedures can now recursively be discovered in subdatasets as well.
     The uppermost has highest priority
   - Procedures in user and system locations now take precedence over
     those in datasets.

* tag '0.11.0':
  CHANGELOG adjusted to reflect new minimal version of git-annex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants