-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TI_RecordedFuture] Support IoC expiration #5460
Conversation
🌐 Coverage report
|
/test |
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
# | ||
# Fingerprint event: _id = hash(dataset + indicator type + indicator value) | ||
# | ||
- fingerprint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every hour (that's the default interval
) this integration downloads the full CSV file. If a given CSV line has already been indexed, is there a need to index it again? Could we fingerprint the raw CSV to avoid duplicating data? (update: It looks like retention is implemented based on event.ingested
so this does depend on the data being continuously reindexed again in order to keep a full threat picture in the "latest" index.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another step we could take to minimize retaining a bunch of duplicate data is to institute a custom ILM policy for this data such that it keeps a limited amount of the raw data (like a few hours worth). After the data has been read by the latest transforms there is not much use for it. In other words, as long as we have the output of the transform then the original data stream does not need much retention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, yeah it does make sense. I see there is a way to specify ILM policy in package-spec I can work on it. In the documentation for the integration, we probably need to mention users about how the source polling interval
, transform retention_policy
, and ILM delete.min_age
need to be configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ILM policy to rollover
after 2 days and delete
3 days after rollover
. Added README explaining the same.
retention_policy: | ||
time: | ||
field: event.ingested | ||
max_age: 15m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was this value determined? Does this need to be longer than interval at which data is fetched from the API? What happens if we fetch data every hour, but only retain the last 15 minutes of latest data for an IOC? Does that mean we have 45 minutes where this logs-ti_recordedfuture_latest.threat
is empty?
My thinking is that this value needs to be at least as long as the fetch interval plus the sync time delay. Then the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value can be anything less than
the source interval
through which the API gets polled.
The reason is:
- If it is greater than the interval, expired IOC still remain in destination indices.
- If it is less, based on my tests, it wouldn't delete the documents unless the transform runs again, i.e., same interval when documents gets ingested via source polling. In other words, deletion via this
retention_policy.time.max_age
only happens when the transform is triggered. The fields that determine whether to trigger transform are: frequency and latest.sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed max_age
to 24h after discussion with Jamie as it seems like a reasonable default since this is not a user configurable parameter.
packages/ti_recordedfuture/elasticsearch/transform/latest_ioc/transform.yml
Show resolved
Hide resolved
@@ -16,13 +16,13 @@ An example event for `threat` looks as following: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs are going to need to be updated to explain how IoC expiration works and what index to read from when building indicator match rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added README explaining IOC expiration and ILM policy on source indices.
@@ -0,0 +1,21 @@ | |||
source: | |||
index: | |||
- ".ds-logs-ti_recordedfuture.threat-default*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the downsides to sourcing data from all logs-ti_recordedfuture.threat namespaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that if multiple namespaces with same dataset is added by the user when setting up the integration, then transform could read from all namespaces/datastreams and deduce the latest
.
But also hardcoding to default
isn't good either. Also the transform itself is named like this as per spec.. So, in case of default
namespace, our transform is being named as: logs-ti_recordedfuture.latest_ioc-default-0.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the index to ".ds-logs-ti_recordedfuture.threat-*
. So it assumes only 1 namespace exists for this integration, whatever is its value.
mapping: | ||
ignore_above: 1024 | ||
type: keyword | ||
date_detection: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you enable date_detection? Do we have dynamic fields that contain date values that are not mapped as date
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there aren't. Set it to false
/test |
/test |
Testing with package upgrade: The transform isn't getting updated, when package is upgraded with Test 1: No code change, only modified manifest and changelog
Test 2: Modified transform with
Results: |
@kcreddy Thanks for testing this manually. Unless my expectations are wrong, I assume this is a bug in Fleet. Can you create a new elastic/kibana Fleet issue to get some clarity on on what to expect from transform upgrades in Fleet packages. If this is the expected behavior for transforms then what are the steps required by users to upgrade. We need to know this so we can document it and support users. |
Hey @andrewkroh Interestingly, the new SNAPSHOT version doesn’t seem to create destination index with package version suffix, as now the destination index name is exactly as given in the transform.yml:
Also these indices are not deleted. This should work for us now. However, the transform update issue/bug during the package upgrade is still present. I have added the issue with details here: elastic/kibana#155982 and requested Kyle Pollich to take a look. |
Starting |
packages/ti_recordedfuture/elasticsearch/transform/latest_ioc/transform.yml
Show resolved
Hide resolved
index: | ||
- "logs-ti_recordedfuture.threat-*" | ||
dest: | ||
index: "logs-ti_recordedfuture_latest.threat-1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this version be changed? I think we should have a comment here to help maintainers understand when this should happen and what, if any, is the relationship to the transform version.
I have been thinking about whether we should have some versioning convention here. Like perhaps we use a simple one up counter that's not semver and not tied in with the transform version. This value gets incremented only if there is some incompatible (breaking) schema change.
If we did a version change, we would probably need a means to automatically purge the old destination index so that readers using wildcards (logs-*_latest-*
) are not getting results from the old index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments for dest index versioning and a NOTE
for future contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment is missing when and why the transform index version should be changed
IIUC if we ever need some kind of breaking schema then we would 1) increment dest index version and 2) increment the transform version. Without (2) the change to the destination index would have no effect.
Is there any issue that you know of is tracking the ability to cleanup an orphaned destination index? We should register our requirement in elastic/package-spec and link to the issue in the comment. If we can't have that feature, then next time we need to make a change we would need to pursue some other approach like documenting a manual cleanup.
Like perhaps we use a simple one up counter that's not semver and not tied in with the transform version
What do you think about that? In addition I don't want to it to appear that the transform's dest.index
version is related to the package version. That might lead to confusion about why the dest.index
did not change after an upgrade. Using a simple -1
, -2
, ... versioning approach there could accomplish that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. I also agree that transform version has to be different from the dest.index version. Simple versions should also do the trick.
packages/ti_recordedfuture/elasticsearch/transform/latest_ioc/transform.yml
Show resolved
Hide resolved
@@ -1,6 +1,9 @@ | |||
# Use of "*" to use any namespace defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/any namespace/all namespaces/
# The version suffix on destination index must be explicitly set. | ||
# NOTE: During version change, ensure through some mechanism old destination index automatically purges so queries on wild card such as logs-*_latest-* doesn't result in duplicates | ||
dest: | ||
index: "logs-ti_recordedfuture_latest.threat-1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I am thinking...
# The version suffix on destination index must be explicitly set. | |
# NOTE: During version change, ensure through some mechanism old destination index automatically purges so queries on wild card such as logs-*_latest-* doesn't result in duplicates | |
dest: | |
index: "logs-ti_recordedfuture_latest.threat-1.0.0" | |
# The version suffix on the dest.index should be incremented if a breaking change | |
# is made to the index mapping. You must also bump the fleet_transform_version | |
# for any change to this transform configuration to take effect. The old destination | |
# index is not automatically removed. We are dependent on {issue} to give | |
# us that ability in order to prevent having duplicate IoC data and prevent query | |
# time field type conflicts. | |
dest: | |
index: "logs-ti_recordedfuture_latest.threat-1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Updated as suggested. Linked the underlying issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
What does this PR do?
Supports IoC (Indicators of Compromise) expiration by:
fingerprint
processor fromti_recordedfuture.threat*
source indices to allow duplicate documents.latest
transform to create destination indices containing only unexpired/latest IoCsChecklist
changelog.yml
file.How to test this PR locally
Related issues
Screenshots