-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x-pack/filebeat/input/entityanalytics/provider/activedirectory: don't publish unmodified updates #41179
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
This pull request is now in conflicts. Could you fix it? 🙏
|
73f01b5
to
e87a06f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to avoid duplicates or to avoid redundant work?
Does this mean that a computer added after the last full sync won't be ingested until the next full sync?
To avoid redundant work.
No, it should be added with an update. I will reassess. |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed elsewhere, the way forward here may be to:
- do all the additions from incremental changes
- do the deletes only on full sync
Redundant additions could be skipped during full sync but I think it's better to do them anyway, as a way to periodically recover from missed updates.
p.logger.Errorw("Error running incremental update", "error", err) | ||
p.metrics.updateError.Inc() | ||
} | ||
p.metrics.updateTotal.Inc() | ||
p.metrics.updateProcessingTime.Update(time.Since(start).Nanoseconds()) | ||
updateTimer.Reset(p.cfg.UpdateInterval) | ||
p.logger.Debugf("Next update expected at: %v", time.Now().Add(p.cfg.UpdateInterval)) | ||
last = start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If time was used I think it would be better to get it from state.whenChanged
in order to minimize issues from disagreements between client and server time (specifically, when the server doesn't provide all data up to the time of a request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little more subtle than that, but I think I have got what you want in my next change.
@chrisberkhout PTAL Requires careful review. |
x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Activedirectory could use a final tweak of the time value.
Jamf I'm not sure whether the changes are necessary. Also it seems to not handle deletes.
x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go
Outdated
Show resolved
Hide resolved
x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go
Show resolved
Hide resolved
@@ -350,7 +351,7 @@ func (p *jamfInput) runFullSync(inputCtx v2.Context, store *kvstore.Store, clien | |||
// runIncrementalUpdate will run an incremental update. The process is similar | |||
// to full synchronization, except only users which have changed (newly | |||
// discovered, modified, or deleted) will be published. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only users which have changed (newly discovered, modified, or deleted) will be published
I don't see deletes being handled anywhere.
In doFetchComputers
it will always fetch everything and return whatever is new or modified compared to the store (unless its a fullSync, in which case it returns nothing).
I don't see any mention of deletes in the GET /api/preview/computers
documentation.
ddd9e66
to
9e7674c
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
…: don't publish unmodified updates
This leaves a small window when it is possible to send redundant data, but it is simpler and safer than tightening the bounds.
Only mark users as deleted on the basis of full sync since this is the only time that we can know that a user is absent. Also remove users from the store on close if they have been marked as deleted.
9e7674c
to
3709f8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
There's a duplicate line in the changelog.
Also the PR description and changelog could say more clearly what the changed behavior is:
- I think "don't publish unmodified updates" is only happening in that the timestamp handling is improved. It's not actually skipping publication of things received from the server but already known (because the incremental sync doesn't get them and the full sync should to do a full publish anyway).
- The other, more important change is that the old version was deleting anything that didn't appear in the new incremental sync response, and now it will only do the deletions during the full sync and only of things that the server actually doesn't have anymore.
CHANGELOG.next.asciidoc
Outdated
@@ -164,6 +164,8 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff] | |||
- Fixed failed job handling and removed false-positive error logs in the GCS input. {pull}41142[41142] | |||
- Bump github.com/elastic/go-sfdc dependency used by x-pack/filebeat/input/salesforce. {pull}41192[41192] | |||
- Log bad handshake details when websocket connection fails {pull}41300[41300] | |||
- Don't send redundant documents for non-modified entities in the Jamf and Active Directory entityanalytics input. {pull}41179[41179] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Don't send redundant documents for non-modified entities in the Jamf and Active Directory entityanalytics input. {pull}41179[41179] |
…ve modification time and deletion logic (#41179) This improves the update time stamps of modified events by using the documents' whenChanged fields in the case of returned documents, and the current time when a document is identified as having been deleted. The latest of these is used to determine the time filter for the next Active Directory query. Documents are marked as deleted only when they are found to not exist in full sync collection, and are removed from the state store when they are identified as deleted. The change in behaviour to not use updates to identify corrects behaviour that would cause older but not deleted entities to be deleted from the index. (cherry picked from commit 6b54074) # Conflicts: # x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go
…ve modification time and deletion logic (#41179) This improves the update time stamps of modified events by using the documents' whenChanged fields in the case of returned documents, and the current time when a document is identified as having been deleted. The latest of these is used to determine the time filter for the next Active Directory query. Documents are marked as deleted only when they are found to not exist in full sync collection, and are removed from the state store when they are identified as deleted. The change in behaviour to not use updates to identify corrects behaviour that would cause older but not deleted entities to be deleted from the index. (cherry picked from commit 6b54074)
…/activedirectory: don't publish unmodified updates (#41386) * x-pack/filebeat/input/entityanalytics/provider/activedirectory: improve modification time and deletion logic (#41179) This improves the update time stamps of modified events by using the documents' whenChanged fields in the case of returned documents, and the current time when a document is identified as having been deleted. The latest of these is used to determine the time filter for the next Active Directory query. Documents are marked as deleted only when they are found to not exist in full sync collection, and are removed from the state store when they are identified as deleted. The change in behaviour to not use updates to identify corrects behaviour that would cause older but not deleted entities to be deleted from the index. (cherry picked from commit 6b54074) * remove irrelevant changelog entry --------- Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
…ve modification time and deletion logic (#41179) This improves the update time stamps of modified events by using the documents' whenChanged fields in the case of returned documents, and the current time when a document is identified as having been deleted. The latest of these is used to determine the time filter for the next Active Directory query. Documents are marked as deleted only when they are found to not exist in full sync collection, and are removed from the state store when they are identified as deleted. The change in behaviour to not use updates to identify corrects behaviour that would cause older but not deleted entities to be deleted from the index. (cherry picked from commit 6b54074) # Conflicts: # x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go
…r/activedirectory: improve modification time and deletion logic (#41385) * x-pack/filebeat/input/entityanalytics/provider/activedirectory: improve modification time and deletion logic (#41179) This improves the update time stamps of modified events by using the documents' whenChanged fields in the case of returned documents, and the current time when a document is identified as having been deleted. The latest of these is used to determine the time filter for the next Active Directory query. Documents are marked as deleted only when they are found to not exist in full sync collection, and are removed from the state store when they are identified as deleted. The change in behaviour to not use updates to identify corrects behaviour that would cause older but not deleted entities to be deleted from the index. (cherry picked from commit 6b54074) # Conflicts: # x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go * fix cherry pick failures --------- Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
Proposed commit message
See title.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs