Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result hooks #304

Merged
merged 9 commits into from
Dec 18, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/basics/101-124-procedures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,10 @@ with the help of a procedure.
Especially in the case of trainees and new users, applying procedures
instead of doing relevant routines "by hand" can help to ease
working with the dataset, as the use case :ref:`usecase_student_supervision`
showcases.
showcases. Other than by users, procedures can also be triggered to automatically
run after any command execution if a command results matches a specific
requirement. If you are interested in finding out more about this, read on in
section :ref:`hooks`.

Finally, make a note about running procedures inside of ``notes.txt``:

Expand Down
162 changes: 162 additions & 0 deletions docs/basics/101-145-hooks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
.. _hooks:

DataLad's result hooks
^^^^^^^^^^^^^^^^^^^^^^

If you are particularly keen on automating tasks in your datasets, you may be
interested in running DataLad commands automatically as soon
as previous commands are executed and resulted in particular outcomes or states.
For example, you may want to automatically :command:`unlock` all dataset contents
right after an installation in one go. Therefore, you would like to automatically
run the :command:`datalad unlock .` command right after the :command:`datalad install`
command, *but only* if the previous :command:`install` command was successful.

Such automation allows for flexible and yet automatic responses to the results
of DataLad commands, and can be done with DataLad's *result hooks*.
Generally speaking, `hooks <https://en.wikipedia.org/wiki/Hooking>`__ intercept
function calls or events and allow to extend the functionality of a program.
DataLad's result hooks are calls to other DataLad commands after the command
resulted in a specified result -- such as a successful install.

To understand how hooks can be used and defined, we have to briefly mention
DataLad's *command result evaluations*. Whenever a DataLad
command is executed, an internal evaluation generates a *report* on the status
and result of the command. Internally, this is useful for final result
rendering, error detection, and logging. However, by using hooks, you can
utilize these evaluations for your own purposes and "hook" in more commands
whenever an evaluation fulfills your criteria.

To be able to specify matching criteria, you need to be aware of the potential
criteria you can match against. The evaluation report is a dictionary with
``key:value`` pairs. The following table provides an overview on some of the
available keys and their possible values:

.. list-table::
:widths: 50 100
:header-rows: 1

* - Key name
- Values
* - ``action``
- ``get``, ``install``, ``drop``, ``status``, ... (any command's name)
* - ``type``
- ``file``, ``dataset``, ``symlink``, ``directory``
* - ``status``
- ``ok``, ``notneeded``, ``impossible``, ``error``
* - ``path``
- The path the previous command operated on
adswa marked this conversation as resolved.
Show resolved Hide resolved

These key-value pairs provide the basis to define matching rules that -- once met --
can trigger the execution of custom hooks.
To define a hook based on certain command results, two configuration variables
need to be set in ``.datalad/config``:
adswa marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

datalad.result-hook.<name>.match-json

and

.. code-block:: bash

datalad.result-hook.<name>.call-json

Here is what you need to know about these variables:

- The ``<name>`` part of the configurations is the same for both variables, and can be
an arbitrarily [#f1]_ chosen name that serves as an identifier for the hook you are
defining.

- The first configuration variable, ``datalad.result-hook.<name>.match-json``, defines
the requirements that a result evaluation needs to match in order to trigger the hook.

- The second configuration variable, ``datalad.result-hook.<name>.call-json``, defines
what the hook execution comprises. It can be any DataLad command of your choice.

And here is how to set the values for these variables:

- The value for ``datalad.result-hook.<name>.match-json`` needs to be specified as
a JSON-encoded dictionary with any number of keys, such as

.. code-block:: bash

{"type": "file", "action": "get", "status": "notneeded"}
adswa marked this conversation as resolved.
Show resolved Hide resolved

This translates to: "Match a "not-needed" after :command:`datalad get` of a file."
If all specified values in the keys in this dictionary match the values of the
same keys in the result evaluation, the hook is executed. Apart from ``==``
evaluations, ``in``, ``not in``, and ``!=`` are supported. To make use of such
operations, the test value needs to be wrapped into a list, with the first item
being the operation, and the second value the test value, such as

.. code-block:: bash

{"type": ["in", ["file", "directory"]], "action": "get", "status": "notneeded"}

This translates to: "Match a "not-needed" after :command:`datalad get` of a file or directory."
Another example is this::

'{"type":"dataset","action":"install","status":["eq", "ok"]}'

which translates to: "Match a successful installation of a dataset".

- The value for ``datalad.result-hook.<name>.call-json`` is specified in its
Python notation, and its options are specified as a JSON-encoded dictionary
with keyword arguments. Conveniently, a number of string substitutions are
supported: a ``dsarg`` argument expands to the ``dataset`` given to the initial
command the hook operates in, and any key from the result evaluation can be
adswa marked this conversation as resolved.
Show resolved Hide resolved
expanded to the respective value in the result dictionary. Curly braces need to
be escaped by doubling them.
This is not the easiest specification there is, but its also not as hard as it
may sound. Here is how this could look like for a :command:`datalad unlock`::

$ unlock {{"dataset": "{dsarg}", "path": "{path}"}}

This translates to "unlock the path the previous command operated on, in the
dataset the previous command operated on". Another example is this run command::

$ run {{"cmd": "touch {path}_annoyed", "dataset": "{dsarg}", "explicit": true}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was me, but it is a stupid example ;-)

What about replacing the command with cp ~/templates/standard-readme.txt {path}/README and sell it as automatically populate a dataset with a default README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Personally, I found the example hilarious ;-) Will change it nevertheless)

Copy link
Contributor Author

@adswa adswa Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to get such a hook to run because of an inconveniently coinciding mismatching formatting options and command requirements: I am trying to execute "datalad run --output "README" "cp ~/Templates/standard-readme.txt {path}/README" --explicit" via a hook after successful installation of a dataset. Here are my git config calls:

For matching:

git config --global --add datalad.result-hook.readme.match-json '{"type": "dataset","action":"create","status":"ok"}'

Hook definition:

git config --global --add datalad.result-hook.readme.call-json 'run {{"cmd":"cp ~/Templates/standard-readme.txt {path}/README", "outputs":"["README"]", "dataset":"{path}","explicit":true}}'

The important part here is "outputs":"["README"]". I need to give the output definition as a list with the string. I thought I once PR'ed an assure_list(), but could only find this for create...

This fails with (Debug output):

[WARNING] Invalid argument specification for hook readme (after parameter substitutions): {"cmd":"cp ~/Templates/standard-readme.txt /tmp/ads18/README", "outputs":"["README"]", "dataset":"/tmp/ads18","explicit":true} [Expecting ',' delimiter: line 1 column 77 (char 76) [decoder.py:raw_decode:353]], hook will be skipped 

Because in "outputs":"["README"]", only the "[" part is considered. Switching to single quotes ("outputs":"['README']") or leaving quotation marks completely (both lead to no quotes in the config file) leads to README being split into its components:

[DEBUG  ] Resolved dataset for saving: /tmp/ads21 
[DEBUG  ] Determined class of decorated function: <class 'datalad.core.local.status.Status'> 
[DEBUG  ] Resolved dataset for status reporting: /tmp/ads21 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/A)] 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/D)] 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/E)] 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/E)] 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/M)] 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/R)] 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/[)] 
[DEBUG  ] Resolved dataset for path resolution: /tmp/ads21 
[ERROR  ] path not underneath this dataset [status(/tmp/])] 
[DEBUG  ] Determined 0 datasets for saving from input arguments 
[DEBUG  ] chdir '/tmp' -> '/tmp' (coming back) 
[DEBUG  ] could not perform all requested actions: Command did not complete successfully [{'action': 'status', 'path': PosixPath('/tmp/A'), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}, {'action': 'status', 'path': PosixPath('/tmp/D'), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}, {'action': 'status', 'path': PosixPath('/tmp/E'), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}, {'action': 'status', 'path': PosixPath('/tmp/E'), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}, {'action': 'status', 'path': PosixPath('/tmp/M'), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}, {'action': 'status', 'path': PosixPath('/tmp/R'), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}, {'action': 'status', 'path': PosixPath('/tmp/['), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}, {'action': 'status', 'path': PosixPath('/tmp/]'), 'refds': '/tmp/ads21', 'status': 'error', 'message': 'path not underneath this dataset'}] [utils.py:generator_func:495] 

I've unsucessfully tried escaping single or double quotes in the config call as well (e.g., git config --global --replace-all datalad.result-hook.readme.call-json 'run {{"cmd":"cp ~/Templates/standard-readme.txt {path}/README", "outputs":"[\'README\']", "dataset":"{path}","explicit":true}}'. Do you see a way of giving a list with a string to the hook definition @mih?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it myself, but why "["README"]" instead of ["README"]. This is valid JSON for a list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work fine without the outer quotes.

% git config --add datalad.result-hook.readme.match-json '{"type": "dataset","action":"create","status":"ok"}'
% git config --add datalad.result-hook.readme.call-json 'run {{"cmd":"cp ~/Templates/standard-readme.txt {path}/README", "outputs":["README"], "dataset":"{path}","explicit":true}}'
% datalad create -d . subds1
[INFO   ] Creating a new annex repo at /tmp/testhook/subds1 
add(ok): subds1 (file)                                                                                                       
add(ok): .gitmodules (file)
save(ok): . (dataset)
create(ok): subds1 (dataset)
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 
(datalad3-dev) 1 mih@meiner /tmp/testhook (git)-[master] % ls subds1
README
(datalad3-dev) mih@meiner /tmp/testhook (git)-[master] % cat subds1/README
dummy


This translate to "execute a run command in the dataset the previous command operated
on. It should create an empty file under the same path the previous command
operated on, with an added '_annoyed' in the file name." A final example is this::

$ run_procedure {{"dataset":"{path}","spec":"cfg_metadatatypes bids"}}

This hook will run the procedure ``cfg_metadatatypes`` with the argument ``bids``
and thus set the standard metadata extractor to be bids.


As these variables are configuration variables, they can be set via :command:`git config` [#f2]_::

$ git config -f .datalad/config --add datalad.result-hook.annoy.call-json 'run {{"cmd":"touch {path}_annoyed", "dataset":"{dsarg}","explicit":true}}'
$ git config -f .datalad/config --add datalad.result-hook.annoy.match-json '{"type":["in", ["file"]],"action":"get","status":"notneeded"}'

Here is what this writes to the ``.datalad/config`` file::

[datalad "result-hook.annoy"]
call-json = run {{\"cmd\":\"touch {path}_annoyed\", \"dataset\":\"{dsarg}\",\"explicit\":true}}
match-json = {\"type\":[\"in\", [\"file\"]],\"action\":\"get\",\"status\":\"notneeded\"}

Given this configuration in the ``.datalad/config`` file of your dataset, the
"annoy" hook would be executed whenever you run :command:`datalad get` on a file
and the command evaluates to "notneeded". The annoy hook would then automatically
create an empty file with the same name as the one you attempted to get, but with
an appened ``_annoy`` in the file name [#f3]_.
mih marked this conversation as resolved.
Show resolved Hide resolved



.. rubric:: Footnotes

.. [#f1] It only needs to be compatible with :command:`git config`. This means that
it for example should not contain any dots (``.``).

.. [#f2] To re-read about the :command:`git config` command and other configurations
of DataLad and its underlying tools, go back to the chapter on Configurations,
starting with :ref:`config`.

.. [#f3] Its a toy example, but supposedly highly effective in training yourself
(or others) to restrain from using :command:`datalad get`. There is generally
no reason to do that, but why miss a chance on classical conditioning?
`B.F. Skinner <https://en.wikipedia.org/wiki/B._F._Skinner>`_ would be
`proud <https://xkcd.com/1156/>`_.
2 changes: 1 addition & 1 deletion docs/contents.rst.inc
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@

basics/101-179-gitignore
basics/101-144-intro_extensions

basics/101-145-hooks

#############
**Use Cases**
Expand Down