Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install recursive into existing dataset #2982

Closed
bpoldrack opened this issue Nov 7, 2018 · 4 comments
Closed

install recursive into existing dataset #2982

bpoldrack opened this issue Nov 7, 2018 · 4 comments
Labels
severity-normal standard severity

Comments

@bpoldrack
Copy link
Member

install --recursive is broken. It works fine if just calling datalad install -r -s whatever to/whereever.
But it doesn't if you called with an additional -d some. First level is installed, then it fails with

[ERROR  ] get() got an unexpected keyword argument dataset [dataset.py:apply_func:486] (TypeError)

I didn't look deeply into it yet, but this suggests, that get is called with dataset=... while already being bound to a dataset instance (or sth similar). I guess, we never really considered this use case and when calling get there's a confusion of what's the reference dataset to operate on.

FTR: Of course the same thing can still be achieved. datalad install .d . -s toplevel and only afterwards calling datalad install toplevel/sub works fine.

@bpoldrack bpoldrack added the severity-normal standard severity label Nov 7, 2018
@yarikoptic
Copy link
Member

wow, that is quite a blow indeed, fwiw replicated

$> datalad create testds
[INFO   ] Creating a new annex repo at /home/yoh/.tmp/testds 
create(ok): /home/yoh/.tmp/testds (dataset)
$> datalad install -d testds -r ///labs/haxby    
[INFO   ] Cloning http://datasets.datalad.org/labs/haxby [1 other candidates] into '/home/yoh/.tmp/testds/haxby' 
install(ok): haxby (dataset)                                                                                                                                                                                                      
[ERROR  ] get() got an unexpected keyword argument dataset [dataset.py:apply_func:434] (TypeError) 

@yarikoptic
Copy link
Member

this seems to fix it

$> git diff datalad/distribution/install.py
diff --git a/datalad/distribution/install.py b/datalad/distribution/install.py
index f867cbf5..8180f5df 100644
--- a/datalad/distribution/install.py
+++ b/datalad/distribution/install.py
@@ -362,6 +362,11 @@ class Install(Interface):
 
         # Now, recursive calls:
         if recursive or get_data:
+            if 'dataset' in common_kwargs:
+                common_kwargs_ = common_kwargs.copy()
+                common_kwargs_.pop('dataset')
+            else:
+                common_kwargs_ = common_kwargs
             for r in destination_dataset.get(
                     curdir,
                     description=description,
@@ -374,7 +379,7 @@ class Install(Interface):
                     on_failure='ignore',
                     return_type='generator',
                     result_xfm=None,
-                    **common_kwargs):
+                    **common_kwargs_):
                 yield r
         # at this point no futher post-processing should be necessary,
         # `clone` and `get` must have done that (incl. parent handling)

but I've not done yet full analysis is that is the correct thing to do or just resort to calling Get.__call__ there and not yet 100% sure why we need get instead of install if recursive at that point

@yarikoptic
Copy link
Member

yeap, this seems to work as well... I will add a test and submit PR to see if anything barfs

$> git diff datalad/distribution/install.py
diff --git a/datalad/distribution/install.py b/datalad/distribution/install.py
index f867cbf5..f8e7b548 100644
--- a/datalad/distribution/install.py
+++ b/datalad/distribution/install.py
@@ -362,8 +362,8 @@ class Install(Interface):
 
         # Now, recursive calls:
         if recursive or get_data:
-            for r in destination_dataset.get(
-                    curdir,
+            for r in Get.__call__(
+                    destination_dataset.path,
                     description=description,
                     # TODO expose this
                     # yoh: exactly!

@yarikoptic yarikoptic added this to the Release 0.11.1 milestone Nov 7, 2018
@yarikoptic
Copy link
Member

eh, not that easy -- apparently breaks test(s) causing "double installation"

$> python -m nose -s -v datalad/distribution/tests/test_install.py:test_install_recursive
datalad.distribution.tests.test_install.test_install_recursive ... > /home/yoh/proj/datalad/datalad-master/datalad/distribution/tests/test_install.py(370)test_install_recursive()
-> eq_(len(ds_list), 3)
(Pdb) p ds_list
[<Dataset path=/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY>, <Dataset path=/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY>, <Dataset path=/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY/2>, <Dataset path=/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY/subm 1>]
*(Pdb) for ds in ds_list: print ds.path
/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY
/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY
/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY/2
/home/yoh/.tmp/datalad_temp_test_install_recursiveC97OrY/subm 1

so either additional checking/filtering to not act is needed or smth else

yarikoptic added a commit to yarikoptic/datalad that referenced this issue Nov 7, 2018
yarikoptic added a commit that referenced this issue Nov 27, 2018
	## 0.11.1 (Nov 25, 2018) -- v7-better-than-v6

	Rushed out bugfix release to stay fully compatible with recent
	[git-annex] which introduced v7 to replace v6.

	### Fixes

	- [install]: be able to install recursively into a dataset ([#2982])
	- [save]: be able to commit/save changes whenever files potentially
	  could have swapped their storage between git and annex
	  ([#1651]) ([#2752]) ([#3009])
	- [aggregate-metadata]:
	  - dataset's itself is now not "aggregated" if specific paths are
		provided for aggregation ([#3002]). That resolves the issue of
		`-r` invocation aggregating all subdatasets of the specified dataset
		as well
	  - also compare/verify the actual content checksum of aggregated metadata
		while considering subdataset metadata for re-aggregation ([#3007])
	- `annex` commands are now chunked assuming 50% "safety margin" on the
	  maximal command line length. Should resolve crashes while operating
	  ot too many files at ones ([#3001])
	- `run` sidecar config processing ([#2991])
	- no double trailing period in docs ([#2984])
	- correct identification of the repository with symlinks in the paths
	  in the tests ([#2972])
	- re-evaluation of dataset properties in case of dataset changes ([#2946])
	- [text2git] procedure to use `ds.repo.set_gitattributes`
	  ([#2974]) ([#2954])
	- Switch to use plain `os.getcwd()` if inconsistency with env var
	  `$PWD` is detected ([#2914])
	- Make sure that credential defined in env var takes precedence
	  ([#2960]) ([#2950])

	### Enhancements and new features

	- [shub://datalad/datalad:git-annex-dev](https://singularity-hub.org/containers/5663/view)
	  provides a Debian buster Singularity image with build environment for
	  [git-annex]. [tools/bisect-git-annex]() provides a helper for running
	  `git bisect` on git-annex using that Singularity container ([#2995])
	- Added [.zenodo.json]() for better integration with Zenodo for citation
	- [run-procedure] now provides names and help messages with a custom
	  renderer for ([#2993])
	- Documentation: point to [datalad-revolution] extension (prototype of
	  the greater DataLad future)
	- [run]
	  - support injecting of a detached command ([#2937])
	- `annex` metadata extractor now extracts `annex.key` metadata record.
	  Should allow now to identify uses of specific files etc ([#2952])
	- Test that we can install from http://datasets.datalad.org
	- Proper rendering of `CommandError` (e.g. in case of "out of space"
	  error) ([#2958])

* tag '0.11.1':
  Adjust the date -- 25th fell through due to __version__ fiasco
  BF+ENH(TST): boost hardcoded version + provide a test to guarantee consistency in the future
  This (expensive) approach is not needed in v6+
  small tuneup to changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity-normal standard severity
Projects
None yet
Development

No branches or pull requests

2 participants