Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DF/091: improve error handling #320

Merged
merged 13 commits into from
Apr 13, 2020
Merged

DF/091: improve error handling #320

merged 13 commits into from
Apr 13, 2020

Commits on Feb 17, 2020

  1. Configuration menu
    Copy the full SHA
    57da4f2 View commit details
    Browse the repository at this point in the history
  2. DF/091: refactoring (more DRY code).

    Same operations were performed in `process_input_ds()` and
    `get_output_ds_info()`; now they are collected under the name
    `get_ds_info()`, while all the input/output DS type related operations
    are left to `process_*_ds()`.
    mgolosova committed Feb 17, 2020
    Configuration menu
    Copy the full SHA
    f0c3705 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c671115 View commit details
    Browse the repository at this point in the history
  4. DF/091: fix output datasets processing.

    If single dataset record does not contain enough information to
    construct service fields for ES indexing, it does not mean that all the
    datasets should be skipped.
    mgolosova committed Feb 17, 2020
    Configuration menu
    Copy the full SHA
    bfc096f View commit details
    Browse the repository at this point in the history
  5. DF/091: improve error handling ('_incomlete' messages).

    Changes in error handling.
    1. If Rucio returns `DatasetIdentifierNotFound`, set `deleted` to
       `True` (previously was treated as any other `RucioException`).
    2. In case of `RucioException` mark message as "incomplete" -- this
       record will be written to ES in "update" mode, not "insert"
       (previously `deleted` and `bytes` would be set to `True` and `-1`).
    3. If some of required fields can not be extracted from Rucio (or have
       value `null`), they are left unset (previously would be set to
       `None`.
    4. If in the result message some fields of those that were supposed to
       be added at this stage are missed, the message is marked as
       `_incomplete`.
    
    What is not good in this logic: if dataset was removed from Rucio, it
    will always be marked as "requiring update". But for now we have only
    this option: to use "update" instead of "insert", we need to know that
    the message is incomplete; and when the message coming to Stage 019 is
    incomplete -- it should be marked as "requiring update" for further
    investigation.
    
    Maybe the logic could be extended and "_incomplete" should be turned
    into two different markers: "update since the original source removed
    useful data" and "update since we failed to connect to the original
    source", but... not now.
    mgolosova committed Feb 17, 2020
    Configuration menu
    Copy the full SHA
    5be0448 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    67a80d8 View commit details
    Browse the repository at this point in the history

Commits on Mar 16, 2020

  1. DF/091: fix issue with undefined variable ds (after 5be0448).

    ```
    2020-03-05 19:28:04 (DEBUG) (ProcessorStage) Traceback (most recent call last):
    (==)   File "<...>/ProcessorStage.py", line 231, in run
    (==)     if msg and process(self, msg):
    (==)   File "<...>/091_datasetsRucio/datasets_processing.py", line 233, in process_input_ds
    (==)     data.update(ds)
    (==) UnboundLocalError: local variable 'ds' referenced before assignment
    ```
    mgolosova committed Mar 16, 2020
    Configuration menu
    Copy the full SHA
    41c9ffb View commit details
    Browse the repository at this point in the history

Commits on Mar 17, 2020

  1. DF/091: update samples.

    `/Utils/Dataflow/test/utils/compare_ndjson_files.py` shows changes like these:
    
    ```
    Record seem to differ for uid=mc15_13TeV.387007.MGPy8EG_A14N_BB_direct_300_295_MET100.merge.AOD.e3994_a766_a821_r7676_tid08124406_00
    key = deleted:
    Items missed in (2):
    (1) False
    Items missed in (1):
    (2) True
    Item missed in (2): 'bytes'
    Item missed in (2): 'events'
    Key missed in (1): '_incomplete'
    ```
    
    ...which means:
    * DS with given uid was removed ('deleted: True');
    * fields 'bytes' and 'events' are not presented in the new record;
    * key '_incomplete' is added.
    
    So when this record gets to the 019 (esFormat) and 069 (upload2es), it
    will be written in the "update" mode: 'bytes' and 'events' won't be
    changed, 'removed' will be set to True, and '_update_required' will be
    set to True.
    
    The latter may be not the best option, but it is the only flag we have
    right now to say that there's something wrong about given record... and
    here "wrong" is that Rucio did not provide us with the information we
    needed.
    mgolosova committed Mar 17, 2020
    Configuration menu
    Copy the full SHA
    2c1b055 View commit details
    Browse the repository at this point in the history

Commits on Mar 31, 2020

  1. DF/091: fix issue with undefined DataIdentifierNotFound.

    When the stage fails to load `rucio.client` module, it does not stop the
    execution -- for in case of `--skip` opotion specified we do not
    actually need the module to go on.
    
    The script exits only when tries to initialize the client -- but since
    the initialization occures within `try/except` clause, it first stumbles
    over the expected exception name (`DataIdentifierNotFound`), which it
    failed to load from the `rucio.client`. We had similar issue with
    `RucioException` once, so now the new exception is handled similarly.
    mgolosova committed Mar 31, 2020
    Configuration menu
    Copy the full SHA
    72f374d View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2020

  1. DF/091: improve client initialisation errors handling.

    Rucio client module may raise not only `RucioException`, but also some
    other exceptions: e.g. `IOError` (when files provided in the
    configuration as user certificate/key can not be read).
    
    What we need to do in case of a error in `init_rucio_client()` is to
    make sure that this error can be distinguished from any other error
    occured during the rucio client usage. In other words -- when we fail to
    initialise the client, it is not a "Rucio error" (indicating that we
    have some problems with Rucio), but a "dataflow error" (indicating that
    we have some problems with the stage which is unable to do its job).
    
    Calling `sys.exit()` is a bit severe: yes, we need to stop the stage
    execution if we can't initialise the client; but it would also be
    correct to say that we need to stop the dataflow. And, if
    `DataflowException` is raised during the `process()` execution, it is
    up to the common library to take care about the problem (output
    necessary details and interrupt the stage -- or do whatever is the
    default action when a stage says "I have a problem operating as a part
    of the dataflow process").
    mgolosova committed Apr 2, 2020
    Configuration menu
    Copy the full SHA
    6f9e2ab View commit details
    Browse the repository at this point in the history

Commits on Apr 3, 2020

  1. pyDKB: add 'reason' property to DataflowException.

    Might be useful if we raise it instead of any other exception (and want
    to specify our own exception message, but still would like to keep the
    original exception information as well).
    mgolosova committed Apr 3, 2020
    Configuration menu
    Copy the full SHA
    c61fe9f View commit details
    Browse the repository at this point in the history
  2. DF/091: add special treatment for IOErrors from rucio.client.

    ...to make the traceback shorter, and the topmost error message -- more
    informative.
    mgolosova committed Apr 3, 2020
    Configuration menu
    Copy the full SHA
    e4953d1 View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'origin/master' into 091-error-handling

    Conflicts:
    	Utils/Dataflow/pyDKB/VERSION
    mgolosova committed Apr 3, 2020
    Configuration menu
    Copy the full SHA
    bf2a1a9 View commit details
    Browse the repository at this point in the history