Elasticsearch self repair and handling of "valid" error responses #1471

svenoe · 2021-12-01T18:29:07Z

If for some reason the creation of a ticket in Elasticsearch fails, all subsequent article creations and other ticket updates will fail. We receive a clear error response though, and should use it to try to create the ticket a second time. While at it, we could filter out errors from not being able to handle specific files (like encrypted files, etc.) from the ingest plugin, and maybe put an info text to the log, but not more.

For both to work, first the invokers must be extended to handle errors (task five of #772). The important places for this would be:

Kernel/GenericInterface/Requester.pm line 324: $FunctionResult = $TransportObject->RequesterPerformRequest(
Kernel/GenericInterface/Transport/HTTP/REST.pm line 872: $RestClient->$RestCommand(@RequestParam);
I'm not sure how to do this in a nice way, without looking further into this. In principle we don't want the error handling right after the second, but the communication with the invoker usually is done in the first (e.g. line 206: my $FunctionResult = $InvokerObject->PrepareRequest( and especially interesting since we would probably need something similar line 226: elsif ( $FunctionResult->{StopCommunication}).

For the self repair of the ticket index, line 539-552 of Kernel/System/Console/Command/Maint/Elasticsearch/Migration.pm # create the ticket could be taken, which is quite simple in itself, some measurement has to be taken, though, to not cause infinite loops! :)

The text was updated successfully, but these errors were encountered:

bschmalhofer · 2021-12-16T13:15:34Z

@svenoe
I started looking into this. My understanding is that there already is generic support for error handling. That is line 619 in _Kernel/GenericInterface/Requester.pm:

my $InvokerHandleErrorOutput = $Param{InvokerObject}->HandleError(
    Data => $HandleErrorData,
);

So the first line of investigation would be adding a HandleError() method in Kernel/GenericInterface/Invoker/Elasticsearch/TicketManagement.pm .
I was wondering how to provoke an error in submitting the ticket to ES. The most simple idea I had, is to shut down ES temporarily. Is there a better approach?

svenoe · 2021-12-16T17:50:44Z

The problem is, that if I understand correctly the real error message does not reach this point because of

otobo/Kernel/GenericInterface/Transport/HTTP/REST.pm

Lines 880 to 883 in b6845c9

    
           return { 
        
               Success      => 0, 
        
               ErrorMessage => $ResponseError, 
        
           };

where $ResponseError contains some not so useful information. But probably(?) you are right and we should use this method, but give it useful data.

As to provoking errors, yes, shutting down the ES container is one option. In the end you probably should delete a ticket from the index, too, and update its queue, or so, anyways. I propose copying a normal request out of the Webservice debugger, altering it and sending it via curl. Maybe the copying part is not even needed - it looks relatively easy: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html (Or the more complicated version: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html).

bschmalhofer · 2021-12-21T14:06:50Z

Investigated which clients can be used for manipulating Elasticsearch. Curator, https://www.elastic.co/guide/en/elasticsearch/client/curator/current/about.html, does not do the job, as it is intended for the maintenance of indices. Fell back to using a small script based on Search::Elasticsearch, https://metacpan.org/pod/Search::Elasticsearch.

Checked what happens when updating the queue of a ticket that has been deleted in Elasticsearch.

Response content: '{"error":{"root_cause":[{"type":"document_missing_exception","reason":"[_doc][4]: document missing","index_uuid":"8iUJUa2zR3eaxx_mJjzo3A","shard":"0","index":"ticket"}],"type":"document_missing_exception","reason":"[_doc][4]: document missing","index_uuid":"8iUJUa2zR3eaxx_mJjzo3A","shard":"0","index":"ticket"},"status":404}'

The error message is useful for readding the relevant ticket to Elasticsearch.

bschmalhofer · 2021-12-22T14:54:32Z

While looking into this, I noticed that there is a mixup of message for the generic interface debugger. When a mapping module can't be initialized then the error message of a previous function result is written into the debug log.
The underlying cause is the reusing of the variable $FunctionResult.

Creating an extra issue for this, as this is a bug.

bschmalhofer · 2021-12-23T14:04:42Z

Digging deep into the Generic Interface code I find that there are at lease three opportunities for error handling.

Error handling modules in Kernel/GenericInterface/ErrorHandling that were registered in the webservice configuration
HandleError() methods in the Invoker backend
HandleResponse() with the parameter ResponseSuccess => 0 in the Invoker backend

For the case 1 there is currently only Kernel::GenericInterface::ErrorHandling::RequestRetry which isn't used in the default configuration. The RequestRetry.pm mechanism is like:

RequestRetry.pm creates ReSchedule and PastExecution data and returns that data to the requester Kernel::GenericInterface::Requester.
The Requester creates a scheduled task with the same invoker
The scheduled task is eventually run by the OTOBO Daemon
The Invoker backend can in turn react to the PastExecutionData,

The case 2 is used for dynamic fields. In this case HandleError() is mapped to HandleResponse(). My impression is that this intended as a more straight forward callback mechanism.

The case 3 is a bit confusing. It seems to be redundant with regard to the case 2.

The goal of this issue seems to be more like case 1. It is mostly a retry with a twist. But there can be some nasty cases. When Elasticsearch is down and then restarted, then a lot of tasks can be queued up. When there are many independent changes to a ticket, then each change would cause a reindexing of the ticket in Elasticsearch.

An alternative approach would be:

whenever there is an indexing error then the ticket is marked as dirty in the database
there is a cronjob that reindexes the dirty issues

TODO;

investigate the feasability of the alternative approach, where tickets are marked as dirty

svenoe · 2021-12-23T15:24:58Z

As a quick comment - I do not see it as a real "redo with a twist", yes you are right for the case of rebuilding the complete ticket it definitely is (and yes, care has to be taken to not run into loops), but it has other purposes, too, e.g. if the ingest plugin which extracts data from an attachment returns that it cannot handle encrypted attachments, I don't want an error message in the logs, maybe an info or debug statement, but the plugin complaining is totally fine. So in the end I see it more as a way to silently exit and just potentially start some additional action, as the rebuild.

bschmalhofer · 2022-01-03T14:26:34Z

It looks the search in Elasticsearch and the search in the database are pretty much independent of each other. The database search, especially the article word index, is active per default. The Elasticsearch indexing must be set up as a web service.

There is the table attribute article.search_index_needs_rebuild which act as a dirty mark. Currently this is used only in the Console Commands Maint::Ticket::FulltextIndex.pm and Maint::Ticket::FulltextIndexRebuildWorker. In this setting only the values 0 and 1 are used. This means that this attribute could be abused to also mark the Elasticsearch Index as dirty. Of course adding a new table attribute is also a feasible option.

bschmalhofer · 2022-01-07T11:23:21Z

Discussed this issue with @svenoe and we concluded:

Webservices are used predominately in extensions of OTOBO. Care must be taken that the interface is not changed.
do an early check of errors which are expected, those shouldn't be logged as errors. Example: encrypted mails
Maybe implement a general mechanism for error handling that mark errors as resolved
In some error cases reindex the article via a scheduled task. It is not decided yet, where that task is created.

bschmalhofer · 2022-01-07T14:24:07Z

Back to the drawing board as this is fairly confusing.
It looks like the annoying message is the

$DebuggerObject->Error(
    Summary => $Param{Summary},
    Data    => $Param{Data},
);

in Kernel::GenericInterface::ErrorHandling::HandleError() Currently this method is called in Kernel::GenericInterface::Requester::_HandleError()whenever the transport layer thinks that something is fishy. This can be changed by allowing error handling modules to specify whether the error message should be written to the communication log. The message is then written only when:

there are no error handling modules, which is up to now the most common case
not all error handling module, that have run, requested to suppress the message

It is expected that error handling modules that suppress the message also set StopAfterMatch.

bschmalhofer · 2022-01-10T15:06:28Z

The tentative names for the new error handling modules are:

FailureAccept
ArticleReindex

bschmalhofer · 2022-01-13T15:05:45Z

Started working on the FailureAccept error handling module. Found that adding a new module means a lot of duplication, especially in the admin interface. suppose it makes sense to make the error handling modules more generic.
TODO:

Factor out common HTML snippets
Factor out common functions in the frontend modules

Edit: the TODO items won't be done

svenoe · 2022-01-13T15:40:30Z

Please have a quick look at my message... ;)

just for better readability

of the debugger object.

- a more concise constructor - using 'unless' in end of line control flow - empty line before return

for custom callback that assess the validty of a request response. For now this is only available for HTTP::REST transport objects. There should be no functional changes yet.

bschmalhofer · 2022-01-21T14:46:15Z

Discussed this with @svenoe . We found that using Error modules is not worth it, especially as it would not affect the first message in HTTP::REST anyways. The new approach is to add a AssessResponse() method in TicketManagement.pm. The reindexing of the ticket will be done in HandleResponse().

Using the $ESObject instead.

Index attachment by attachment, so that error handling is easier

…handler Issue #1471 webservice error handler

bschmalhofer · 2022-01-26T14:02:41Z

The current state of the implementation appears to be stable and working. But it is expected that further instances of unhelpful errors will crop up. These should be handled in new issue. Closing this issue.

svenoe added the enhancement New feature or request label Dec 1, 2021

svenoe added this to the OTOBO 10.1.1 milestone Dec 1, 2021

svenoe assigned bschmalhofer Dec 1, 2021

bschmalhofer mentioned this issue Dec 22, 2021

Incorrect message when mapping modules can't be initialized. #1524

Closed

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: fix and enhance POD and code comments

553a9a3

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: check the response code in a tighter scope

aec5dc0

just for better readability

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: make use of the hash ref from the Error() method

b4781d6

of the debugger object.

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying

6dbd532

- a more concise constructor - using 'unless' in end of line control flow - empty line before return

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: enhance POD of the method _HandleError()

2b5e5f0

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying: neater headers

fd1d268

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying: a more concise constructor

7e6f3b5

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying: postfix dereferencing

56ed951

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying: enhance code comments

0429062

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying: use 'unless' for negative postfix control flow

58cbbaf

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: avoid duplicate 'use'

8fd62f0

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying: code layout

f9d839b

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: reduce the scope of some variables

fda4cb0

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: tidying: 'unless' and postfix dereferencing

b48aa2a

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: be more precise about bytes and kilo bytes

25b1d4b

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: enhance code comments

a0d7e27

bschmalhofer added a commit that referenced this issue Jan 20, 2022

Issue #1471: provide infrastructure

2497225

for custom callback that assess the validty of a request response. For now this is only available for HTTP::REST transport objects. There should be no functional changes yet.

bschmalhofer added a commit that referenced this issue Jan 21, 2022

Issue #1471: enhance the POD for Decode()

8371654

bschmalhofer added a commit that referenced this issue Jan 21, 2022

Issue #1471: fix some code comments

5068f6d

bschmalhofer added a commit that referenced this issue Jan 21, 2022

Issue #1471: reindex a ticket upon documemt_missing_exception

ec2ee73

bschmalhofer mentioned this issue Jan 22, 2022

Sorting of entries in Webservice Debugger #1551

Closed

1 task

bschmalhofer added a commit that referenced this issue Jan 23, 2022

Issue #1471: explicitly import 'encode_base64()'

6bd101f

bschmalhofer added a commit that referenced this issue Jan 23, 2022

Issue #1471: refactoring, initialize variables only when needed

402a4c2

bschmalhofer added a commit that referenced this issue Jan 23, 2022

Issue #1471: abort the request when there are no attachments

0061cee

bschmalhofer added a commit that referenced this issue Jan 23, 2022

Issue #1471: tidying: \d is already contained in \w

1763990

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Issue #1471: there is no $Param{ESObject}

b10a9ee

Using the $ESObject instead.

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Issue #1471: remove an unused variable

6a59838

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Issue #1471: no need to import Kernel::System::VariableCheck

f46131b

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Issue #1471: enhance code comments

0b08e1a

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Issue #1471: add INFO message when an error is ignored

65f22a7

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Issue #1471: accept the encrypted docs are not indexed

e318b53

Index attachment by attachment, so that error handling is easier

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Merge branch 'rel-10_1' into issue-#1471-webservice_error_handler

4a8f281

bschmalhofer added a commit that referenced this issue Jan 24, 2022

Merge pull request #1556 from RotherOSS/issue-#1471-webservice_error_…

4e6b852

…handler Issue #1471 webservice error handler

bschmalhofer mentioned this issue Jan 24, 2022

Webservice debug: show the HTTP Method used by the invoker #1555

Closed

bschmalhofer closed this as completed Jan 26, 2022

svenoe pushed a commit that referenced this issue Mar 2, 2022

Issue #1471: Avoid uninitialized value warning.

926130d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch self repair and handling of "valid" error responses #1471

Elasticsearch self repair and handling of "valid" error responses #1471

svenoe commented Dec 1, 2021

bschmalhofer commented Dec 16, 2021

svenoe commented Dec 16, 2021

bschmalhofer commented Dec 21, 2021

bschmalhofer commented Dec 22, 2021

bschmalhofer commented Dec 23, 2021 •

edited

Loading

svenoe commented Dec 23, 2021 •

edited

Loading

bschmalhofer commented Jan 3, 2022

bschmalhofer commented Jan 7, 2022

bschmalhofer commented Jan 7, 2022 •

edited

Loading

bschmalhofer commented Jan 10, 2022

bschmalhofer commented Jan 13, 2022 •

edited

Loading

svenoe commented Jan 13, 2022

bschmalhofer commented Jan 21, 2022 •

edited

Loading

bschmalhofer commented Jan 26, 2022

Elasticsearch self repair and handling of "valid" error responses #1471

Elasticsearch self repair and handling of "valid" error responses #1471

Comments

svenoe commented Dec 1, 2021

bschmalhofer commented Dec 16, 2021

svenoe commented Dec 16, 2021

bschmalhofer commented Dec 21, 2021

bschmalhofer commented Dec 22, 2021

bschmalhofer commented Dec 23, 2021 • edited Loading

svenoe commented Dec 23, 2021 • edited Loading

bschmalhofer commented Jan 3, 2022

bschmalhofer commented Jan 7, 2022

bschmalhofer commented Jan 7, 2022 • edited Loading

bschmalhofer commented Jan 10, 2022

bschmalhofer commented Jan 13, 2022 • edited Loading

svenoe commented Jan 13, 2022

bschmalhofer commented Jan 21, 2022 • edited Loading

bschmalhofer commented Jan 26, 2022

bschmalhofer commented Dec 23, 2021 •

edited

Loading

svenoe commented Dec 23, 2021 •

edited

Loading

bschmalhofer commented Jan 7, 2022 •

edited

Loading

bschmalhofer commented Jan 13, 2022 •

edited

Loading

bschmalhofer commented Jan 21, 2022 •

edited

Loading