Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow delete and index actions with a document ID #12606

Merged

Conversation

bestpath-gb
Copy link
Contributor

Our use-case requires our Beat be able to re-index and delete documents. We always know the document ID but libbeat currently only allows us to create a document when the ID is known.

I have implemented the ability to specify whether an event should index or delete documents with a specified ID by setting an op_type key in the event metadata. Possible values for the key are create, index or delete.

The behaviour currently in master hasn't changed, so if no op_type is specified, an event will result in a create if the ID is given, or an index otherwise.

I found issue #8534, where the author seems to be asking for update actions when the ID is known, which I could support as part of this PR by allowing an update value for the op_type key.

I'd love to see this functionality in libbeat and would welcome feedback on this PR. Does anyone have any thoughts about a better way to go about this?

@bestpath-gb bestpath-gb requested a review from a team as a code owner June 18, 2019 20:11
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@bestpath-gb bestpath-gb force-pushed the feature/allow-index-and-delete-with-id branch 2 times, most recently from a7ca417 to 0cf34d4 Compare March 30, 2020 21:46
if s, ok := tmp.(string); ok {
val = s
} else {
logp.Err("Event[%s] '%v' is no string value", key, val)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what you want here is tmp instead of an empty val.

i.e.

Suggested change
logp.Err("Event[%s] '%v' is no string value", key, val)
logp.Err("Event[%s] '%v' is no string value", key, tmp)

This is also wrong in current code

@ycombinator ycombinator self-requested a review April 2, 2020 13:12
@ycombinator ycombinator self-assigned this Apr 2, 2020
@ycombinator ycombinator added the Team:Services (Deprecated) Label for the former Integrations-Services team label Apr 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@@ -251,6 +251,20 @@ func (client *Client) publishEvents(
return nil, nil
}

func eventMetaValue(event *beat.Event, key string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's nothing in this function that's specific to the Elasticsearch output, how about making the following changes to it?

@@ -270,7 +284,13 @@ func bulkEncodePublishRequest(
log.Errorf("Failed to encode event meta data: %+v", err)
continue
}
bulkItems = append(bulkItems, meta, event)
opType := eventMetaValue(event, "op_type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using the bare string "op_type" in a couple of places, lets make it a const at the top of this file and add a godoc comment above it.

@@ -270,7 +284,13 @@ func bulkEncodePublishRequest(
log.Errorf("Failed to encode event meta data: %+v", err)
continue
}
bulkItems = append(bulkItems, meta, event)
opType := eventMetaValue(event, "op_type")
if opType == "delete" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to making "op_type" a const, lets also make const for this "delete" bare string so we can give it a godoc comment. This will allow developers reading godocs for this package to discover that it's possible to delete events.

if err := b.AddRaw(obj); err != nil {
b.buf.Truncate(pos)
return err
if obj != nil {
Copy link
Contributor

@ycombinator ycombinator Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here for when obj might be nil (in the delete case)?

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bestpath-gb, thanks for this PR. I've requested a few changes to the code. Additionally, it would be really nice to have unit tests for the bulkEncodePublishRequest and createEventBulkMeta functions that test the new (and existing) code paths.

@bestpath-gb
Copy link
Contributor Author

Thanks so much for the feedback, everyone. I'll make the changes and report back.

@bestpath-gb bestpath-gb force-pushed the feature/allow-index-and-delete-with-id branch from 1ab70f3 to b385a8f Compare April 16, 2020 21:50
@ycombinator
Copy link
Contributor

Hi @bestpath-gb, I see you have recently added some commits to this PR. Is it ready for review again or do you need some more time?

@bestpath-gb
Copy link
Contributor Author

Hi @ycombinator . I've just got some tests to write to finish it up. I'm hoping to get them completed this weekend so I'll ask for another review once I've pushed those.

@kvch
Copy link
Contributor

kvch commented Apr 24, 2020

@bestpath-gb May I ask what does your Beat do exactly? It seems interesting to me that you are literally shipping updates to ES, not events. I am just curious. :)

@bestpath-gb
Copy link
Contributor Author

@kvch Of course! We're bringing data from Cisco ACI into Elasticsearch.

Some of it is time series log/telemetry data and we are enriching that with management data (using Logstash).

The management data is indexed using a beat but has to be kept in sync with configuration in ACI, which is where this PR comes in. When objects in ACI are created, modified or deleted, our custom Beat gets notified and it can propagate that to Elasticsearch.

I originally implemented this in Python but would have needed to implement much of what a Beat gets for free, like coping with back pressure (I used Celery to mitigate this), authentication, output to Logstash, configuring indices and pipelines, etc.

@kvch
Copy link
Contributor

kvch commented Apr 24, 2020

Wow, nice! Thanks for sharing.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 29, 2020

💔 Build Failed

Pipeline View Test View Changes Artifacts preview stats

Expand to view the summary

Build stats

  • Build Cause: [Branch indexing]

  • Start Time: 2020-04-30T19:24:43.680+0000

  • Duration: 10 min 37 sec (576645)

  • Commit: 99c2bf2

Steps errors

Expand to view the steps failures

  • Name: Error signal
    • Description:

    • Result: FAILURE

    • Duration: 0 min 0 sec<

    • Start Time: 2020-04-30T19:33:51.719+0000

Log output

Expand to view the last 100 lines of log output

[2020-04-30T19:33:34.737Z]  * [new tag]         v7.4.1     -> v7.4.1
[2020-04-30T19:33:34.737Z]  * [new tag]         v7.4.2     -> v7.4.2
[2020-04-30T19:33:34.737Z]  * [new tag]         v7.5.0     -> v7.5.0
[2020-04-30T19:33:34.737Z]  * [new tag]         v7.5.1     -> v7.5.1
[2020-04-30T19:33:34.737Z]  * [new tag]         v7.5.2     -> v7.5.2
[2020-04-30T19:33:34.737Z]  * [new tag]         v7.6.0     -> v7.6.0
[2020-04-30T19:33:34.737Z]  * [new tag]         v7.6.1     -> v7.6.1
[2020-04-30T19:33:34.737Z]  * [new tag]         v7.6.2     -> v7.6.2
[2020-04-30T19:33:34.737Z]  * [new tag]         v5.0.0-alpha5 -> v5.0.0-alpha5
[2020-04-30T19:33:39.602Z] + git rev-parse HEAD
[2020-04-30T19:33:41.886Z] + git rev-parse HEAD
[2020-04-30T19:33:43.755Z] + git rev-parse origin/pr/12606
[2020-04-30T19:33:44.567Z] [INFO] githubEnv: Found Git Build Cause: pr
[2020-04-30T19:33:47.036Z] Masking supported pattern matches of $GITHUB_TOKEN
[2020-04-30T19:33:50.708Z] [INFO] githubPrCheckApproved: Title: Allow delete and index actions with a document ID - User: bestpath-gb - Author Association: CONTRIBUTOR
[2020-04-30T19:33:53.921Z] Stage "Lint" skipped due to earlier failure(s)
[2020-04-30T19:33:54.470Z] Stage "Build and Test" skipped due to earlier failure(s)
[2020-04-30T19:33:55.567Z] Stage "Elastic Agent x-pack" skipped due to earlier failure(s)
[2020-04-30T19:33:55.572Z] Stage "Elastic Agent x-pack Windows" skipped due to earlier failure(s)
[2020-04-30T19:33:55.573Z] Stage "Elastic Agent Mac OS X" skipped due to earlier failure(s)
[2020-04-30T19:33:55.604Z] Stage "Filebeat oss" skipped due to earlier failure(s)
[2020-04-30T19:33:55.645Z] Stage "Filebeat x-pack" skipped due to earlier failure(s)
[2020-04-30T19:33:55.647Z] Stage "Filebeat Mac OS X" skipped due to earlier failure(s)
[2020-04-30T19:33:55.648Z] Stage "Filebeat Windows" skipped due to earlier failure(s)
[2020-04-30T19:33:55.649Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:55.674Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-04-30T19:33:55.675Z] Stage "Auditbeat x-pack" skipped due to earlier failure(s)
[2020-04-30T19:33:55.676Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:55.730Z] Stage "Libbeat x-pack" skipped due to earlier failure(s)
[2020-04-30T19:33:55.731Z] Stage "Metricbeat OSS Unit tests" skipped due to earlier failure(s)
[2020-04-30T19:33:55.733Z] Stage "Metricbeat OSS Integration tests" skipped due to earlier failure(s)
[2020-04-30T19:33:55.766Z] Stage "Metricbeat Python integration tests" skipped due to earlier failure(s)
[2020-04-30T19:33:55.800Z] Stage "Metricbeat x-pack" skipped due to earlier failure(s)
[2020-04-30T19:33:55.809Z] Stage "Metricbeat crosscompile" skipped due to earlier failure(s)
[2020-04-30T19:33:55.874Z] Stage "Metricbeat Mac OS X" skipped due to earlier failure(s)
[2020-04-30T19:33:55.880Z] Stage "Metricbeat Windows" skipped due to earlier failure(s)
[2020-04-30T19:33:55.881Z] Stage "Packetbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:55.882Z] Stage "dockerlogbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:55.883Z] Stage "Winlogbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:55.929Z] Stage "Winlogbeat Windows x-pack" skipped due to earlier failure(s)
[2020-04-30T19:33:55.935Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:55.940Z] Stage "Journalbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:55.941Z] Stage "Generators" skipped due to earlier failure(s)
[2020-04-30T19:33:55.945Z] Stage "Kubernetes" skipped due to earlier failure(s)
[2020-04-30T19:33:56.933Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:56.976Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-04-30T19:33:56.977Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:56.978Z] Stage "Packetbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:56.980Z] Stage "dockerlogbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:57.071Z] Stage "Winlogbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:57.074Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:57.080Z] Stage "Journalbeat" skipped due to earlier failure(s)
[2020-04-30T19:33:57.081Z] Stage "Generators" skipped due to earlier failure(s)
[2020-04-30T19:34:00.511Z] Failed in branch Elastic Agent x-pack
[2020-04-30T19:34:00.512Z] Failed in branch Elastic Agent x-pack Windows
[2020-04-30T19:34:00.513Z] Failed in branch Elastic Agent Mac OS X
[2020-04-30T19:34:00.514Z] Failed in branch Filebeat oss
[2020-04-30T19:34:00.531Z] Failed in branch Filebeat x-pack
[2020-04-30T19:34:00.532Z] Failed in branch Filebeat Mac OS X
[2020-04-30T19:34:00.554Z] Failed in branch Filebeat Windows
[2020-04-30T19:34:00.561Z] Failed in branch Auditbeat x-pack
[2020-04-30T19:34:00.579Z] Failed in branch Libbeat x-pack
[2020-04-30T19:34:00.580Z] Failed in branch Metricbeat OSS Unit tests
[2020-04-30T19:34:00.581Z] Failed in branch Metricbeat OSS Integration tests
[2020-04-30T19:34:00.582Z] Failed in branch Metricbeat Python integration tests
[2020-04-30T19:34:00.604Z] Failed in branch Metricbeat x-pack
[2020-04-30T19:34:00.605Z] Failed in branch Metricbeat crosscompile
[2020-04-30T19:34:00.630Z] Failed in branch Metricbeat Mac OS X
[2020-04-30T19:34:00.631Z] Failed in branch Metricbeat Windows
[2020-04-30T19:34:00.644Z] Failed in branch Winlogbeat Windows x-pack
[2020-04-30T19:34:00.645Z] Failed in branch Kubernetes
[2020-04-30T19:34:03.702Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-04-30T19:34:03.705Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-04-30T19:34:03.707Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-04-30T19:34:03.752Z] Stage "Winlogbeat" skipped due to earlier failure(s)
[2020-04-30T19:34:03.762Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-04-30T19:34:03.802Z] Stage "Generators" skipped due to earlier failure(s)
[2020-04-30T19:34:04.251Z] Failed in branch Packetbeat
[2020-04-30T19:34:04.252Z] Failed in branch dockerlogbeat
[2020-04-30T19:34:04.256Z] Failed in branch Journalbeat
[2020-04-30T19:34:05.790Z] Stage "Heartbeat" skipped due to earlier failure(s)
[2020-04-30T19:34:05.792Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-04-30T19:34:05.794Z] Stage "Libbeat" skipped due to earlier failure(s)
[2020-04-30T19:34:05.800Z] Stage "Functionbeat" skipped due to earlier failure(s)
[2020-04-30T19:34:05.819Z] Stage "Generators" skipped due to earlier failure(s)
[2020-04-30T19:34:06.071Z] Failed in branch Winlogbeat
[2020-04-30T19:34:07.529Z] Failed in branch Heartbeat
[2020-04-30T19:34:07.530Z] Failed in branch Libbeat
[2020-04-30T19:34:07.532Z] Failed in branch Functionbeat
[2020-04-30T19:34:07.532Z] Stage "Auditbeat oss" skipped due to earlier failure(s)
[2020-04-30T19:34:07.534Z] Stage "Generators" skipped due to earlier failure(s)
[2020-04-30T19:34:08.698Z] Failed in branch Auditbeat oss
[2020-04-30T19:34:08.701Z] Failed in branch Generators
[2020-04-30T19:34:13.860Z] Running on worker-395930 in /var/lib/jenkins/workspace/Beats_beats-beats-mbp_PR-12606
[2020-04-30T19:34:15.966Z] [INFO] getVaultSecret: Getting secrets
[2020-04-30T19:34:16.678Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-04-30T19:34:23.058Z] + chmod 755 generate-build-data.sh
[2020-04-30T19:34:23.058Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats-beats-mbp/PR-12606/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats-beats-mbp/PR-12606/runs/114 FAILURE 576645
[2020-04-30T19:34:23.058Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats-beats-mbp/PR-12606/runs/114/steps/?limit=10000 -o steps-info.json
[2020-04-30T19:34:23.058Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats-beats-mbp/PR-12606/runs/114/tests/?status=FAILED -o tests-errors.json

@elasticmachine
Copy link
Collaborator

elasticmachine commented May 4, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview stats

Expand to view the summary

Build stats

Test stats 🧪

Test Results
Failed 0
Passed 7759
Skipped 1236
Total 8995

Steps errors

Expand to view the steps failures

  • Name: Report to Codecov
    • Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done

    • Result: FAILURE

    • Duration: 2 min 22 sec<

    • Start Time: 2020-05-05T21:46:01.081+0000


encoded, bulkItems := bulkEncodePublishRequest(logp.L(), *common.MustNewVersion(version.GetDefaultVersion()), index, pipeline, events)
assert.Equal(t, len(events)-1, len(encoded), "all events should have been encoded")
assert.Equal(t, 9, len(bulkItems), "incomplete bulk")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed a couple assert -> require places. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! I've corrected those.

@@ -54,6 +54,19 @@ func (e *Event) SetID(id string) {
e.Meta["_id"] = id
}

func (e *Event) GetMetaStringValue(key string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The function can be simplified to this:

tmp, err := e.GetValue("@metadata." + key)
if err == nil {
    if s, ok := tmp.(string); ok {
	return s, nil
    }
}
return "", nil

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about (requires no alloc):

tmp, err := e.Meta.GetValue(key)
if err != nil {
  return "", err
}

if s, ok := tmp.(string); ok {
  return s, nil
}

return "", nil

All in all I'd prefer to keep the interface of Event as small as possible and provide helpers as functions instead of methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is even better. \o/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I've used the version from @urso . Nice to see this level of attention to detail!

Copy link
Contributor

@kvch kvch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but nothing which blocks the merging of the PR.

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ycombinator
Copy link
Contributor

Jenkins CI is green and Travis CI failures are unrelated to the changes in this PR. Merging.

Thanks for your contribution, @bestpath-gb! <3

@ycombinator ycombinator merged commit a141e88 into elastic:master May 6, 2020
@ycombinator ycombinator mentioned this pull request May 6, 2020
6 tasks
@bestpath-gb bestpath-gb deleted the feature/allow-index-and-delete-with-id branch May 6, 2020 19:42
ycombinator added a commit that referenced this pull request May 6, 2020
* Add `op_type` meta key for delete and index operations with a document ID.

* Add note on why `obj` can be nil.

* Refactor and extract consts.

* Don't include metadata prefix in key.

* Stop appending `nil` else it ends up in the body.

* Error when trying to delete with no _id.

* Remove incorrect detail in comment.

* Add unit test for new op_type meta key.

* No longer required.

* Return error to caller.

* Fail event if op_type is no string.

* Use consts in error.

* Replace assert with require.

* Fail instead of panic.

* Change missed assert calls to require.

* Simplify GetMetaStringValue.

* Ignore err as key may not exist.

Co-authored-by: George Bridgeman <49999150+bestpath-gb@users.noreply.github.com>
ycombinator added a commit that referenced this pull request May 6, 2020
* Add `op_type` meta key for delete and index operations with a document ID.

* Add note on why `obj` can be nil.

* Refactor and extract consts.

* Don't include metadata prefix in key.

* Stop appending `nil` else it ends up in the body.

* Error when trying to delete with no _id.

* Remove incorrect detail in comment.

* Add unit test for new op_type meta key.

* No longer required.

* Return error to caller.

* Fail event if op_type is no string.

* Use consts in error.

* Replace assert with require.

* Fail instead of panic.

* Change missed assert calls to require.

* Simplify GetMetaStringValue.

* Ignore err as key may not exist.

Co-authored-by: George Bridgeman <49999150+bestpath-gb@users.noreply.github.com>
@jdmcalee
Copy link

Sorry to add to this so late but I've been unable to find any reference to this functionality anywhere else. What is the proper way to set the op_type metadata field?

I'm wanting to use this with filebeat to use index instead of create, and the only way I've gotten to work is a roundabout three step processor arrangement to use add_fields to make some normal field with the value you want, convert that field into @metadata.op_type, then drop_fields to remove the field added in step 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libbeat review Team:Services (Deprecated) Label for the former Integrations-Services team v7.8.0 v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants