T1499:TA0040 Endpoint DoS Query + Detection #615

natezpanther · 2023-01-06T18:45:38Z

Background

Changes

PR includes a Scheduled Query + Scheduled Rule that are designed to work together with DynamoDB caching to track event counts over time and measure them against a given anomaly threshold value
The hope of this Rule is to handle MITRE T1499 Endpoint Denial of Service - GAP - [TA0040:Impact]

Testing

Unit Tests leverage mocks to override DynamoDB

calkim-panther · 2023-01-06T22:22:13Z

I like the idea of this detection. Using just the mean over a rolling 30-minute may be too FP prone.
I might suggest a slightly different approach - With each run we save the count, hour of day (1-24), and day of week (1-7). We keep a ledger of 30 day period (5 minute blocks == 8640 ledger size). Then we can compare the mean within blocks of the same hour and day to account for differences in volume throughout the day/week.

natezpanther · 2023-01-11T15:38:04Z

@calkim-panther I have altered the code to do two things:

Expand the rolling_ledger to an hour
Track historical maximum counts

Then we account for the average of the rolling_ledger (as before) and also test for a new historical maximum. My hope is that this reduces false positives.

The reason why I do not do relative date comparisons (this Wednesday vs. last Wednesday, for example) is that I don't want to assume there will be predictable trends based on days of the week. I wanted something without that kind of dependency. Thoughts?

calkim-panther · 2023-01-11T16:07:23Z

An issue with the historical max is that it is an all-time max. If there were a massive dos attack that recorded a high max, subsequent attacks with a lower max count would not be detected. Local maximums would be better but I would still be concerned that a rolling mean would be FP prone due to natural periods of high/low activity and legitimate bursts. We would need a ledger of weeks at minimum to have better accuracy and not rely on purely mean but a deviation.

natezpanther · 2023-01-11T17:23:24Z

@calkim-panther Your point about subsequent alerts did occur to me on the historical max values, and so the maximum counts are committed to alert_context() and then purged from DynamoDB after the alert is generated. The rolling ledger never gets purged (and can be expanded by the user if they want to using the global constant).

The new alert condition is:

num_logs > 10x the average (or whatever threshold is set in the global constant)
AND
a new maximum value occurs (max values purged after each new alert)

As a simple example, let's say:
count_ledger['rolling_ledger'] = [10, 20, 40, 20, 100, 20]

So the count_ledger average = 35

And:

count_ledger['highest_counts'] = {
    'XX-XX-XXXX timestamp': 10,
    'XX-XX-XXXX timestamp': 20,
    'XX-XX-XXXX timestamp': 40,
    'XX-XX-XXXX timestamp': 100
}

Then: num_logs = 1000

So 1000 is > 10x the average AND it is a new maximum value. This condition would raise an alert. After the alert,

count_ledger = {
    'rolling_ledger': [10, 20, 40, 20, 100, 20, 1000],
    'highest_counts': {}
}

The next run, num_logs = 1200

At that point, count_ledger['rolling_ledger'] = [10, 20, 40, 20, 100, 20, 1000]
The new count_ledger average = 172.86

1200/172 < 10, so no alert. But rolling_ledger will continue to roll, the average will change, and we will account for any new maximums.

Of course no DoS detection can be absolutely reliable. But I do not think this detection will result in as many FPs as it might seem on the surface. I would love to test this using some real data, since I believe the threshold values and rolling_ledger size can be tweaked for better results by default.

calkim-panther · 2023-01-11T18:28:07Z

Let's merge this as disabled first and we'll enable it in our env and monitor performance.

natezpanther · 2023-01-12T21:25:28Z

@calkim-panther I've disabled the detection. Should I also disable the scheduled query?

T1499:TA0040 Endpoint DoS Query + Detection

c6aaa7e

natezpanther requested review from a team January 6, 2023 18:45

Nate Zemanek added 6 commits January 6, 2023 14:08

Fixing the linter

c1c3e28

Linter fix #2

b66b3fe

Linter fix #3

ba8a99f

Linter fix #4

389e3f2

Linter fix #5

b941b77

Linter Fix #6

3a85f3d

Added highest_count dictionary to reduce false positives

458a9f0

Fixing get_key()

652fe4f

Disabling detection prior to merge

5a63489

calkim-panther approved these changes Jan 18, 2023

View reviewed changes

Merge branch 'master' into npz-mitre-TA0040-T1499

337187f

natezpanther merged commit 2f1e460 into master Jan 25, 2023

natezpanther deleted the npz-mitre-TA0040-T1499 branch January 25, 2023 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T1499:TA0040 Endpoint DoS Query + Detection #615

T1499:TA0040 Endpoint DoS Query + Detection #615

natezpanther commented Jan 6, 2023

calkim-panther commented Jan 6, 2023

natezpanther commented Jan 11, 2023

calkim-panther commented Jan 11, 2023

natezpanther commented Jan 11, 2023 •

edited

Loading

calkim-panther commented Jan 11, 2023

natezpanther commented Jan 12, 2023

T1499:TA0040 Endpoint DoS Query + Detection #615

T1499:TA0040 Endpoint DoS Query + Detection #615

Conversation

natezpanther commented Jan 6, 2023

Background

Changes

Testing

calkim-panther commented Jan 6, 2023

natezpanther commented Jan 11, 2023

calkim-panther commented Jan 11, 2023

natezpanther commented Jan 11, 2023 • edited Loading

calkim-panther commented Jan 11, 2023

natezpanther commented Jan 12, 2023

natezpanther commented Jan 11, 2023 •

edited

Loading