Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add elastic logging for reference data logs #3566

Merged
merged 2 commits into from
Sep 3, 2021

Conversation

michaelcheah
Copy link
Contributor

@michaelcheah michaelcheah commented Sep 3, 2021

What this PR does / why we need it:
This PR adds elasticsearch logging for reference data in the request logger component.

Reference data can be sent as cloud events to the request logger, which will parse and insert the data exactly like it would with live predictions, with the only differences being that:

  1. it inserts it into an index with pattern reference-log-<deployment_type>-<deployment_namespace>-<deployment_name>
  2. it does not insert the entire payload into every instances document in elasticsearch

This PR also introduces a fix for catching bad payloads i.e. if metadata can be found for a request, but the payload does not match the metadata, this throws a BadPayloadException, which is caught, logs a warning, and does not insert the data into elasticsearch.

Previously, if the size of payload columns were larger than the number of metadata features, no errors are thrown and it would be inserted. If payload columns were less than metadata features, it would throw a runtime error (here or here)

Special notes for your reviewer:
A new index pattern has been introduced, so the elastic token needs the required permissions

Does this PR introduce a user-facing change?:

Changed seldon-request-logger component to catch and not insert requests whose metadata do not match the payload dimensions

An example of data inserted into elasticsearch using a reference dataset from the tabular income dataset in this example is shown here:
image

* Add BadPayloadException
* Validate shape of payload against metadata
* do not insert payloads. Payloads are duplicated for every instance (this is especially important since reference data is sent in batches and batches can be very big)
* index name has "reference" prefix and does not use endpoint
Copy link
Contributor

@SachinVarghese SachinVarghese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SachinVarghese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@seldondev seldondev merged commit 46e6685 into SeldonIO:master Sep 3, 2021
@seldondev
Copy link
Collaborator

Failed to merge this PR due to:

failed merging [3566]: [Method Not Allowed]

stephen37 pushed a commit to stephen37/seldon-core that referenced this pull request Dec 21, 2021
* do not insert payloads that do not match metadata

* Add BadPayloadException
* Validate shape of payload against metadata

* add reference data message types

* do not insert payloads. Payloads are duplicated for every instance (this is especially important since reference data is sent in batches and batches can be very big)
* index name has "reference" prefix and does not use endpoint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants