Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T1499:TA0040 Endpoint DoS Query + Detection #615

Merged
merged 11 commits into from
Jan 25, 2023

Conversation

natezpanther
Copy link
Contributor

Background

Changes

  • PR includes a Scheduled Query + Scheduled Rule that are designed to work together with DynamoDB caching to track event counts over time and measure them against a given anomaly threshold value
  • The hope of this Rule is to handle MITRE T1499 Endpoint Denial of Service - GAP - [TA0040:Impact]

Testing

  • Unit Tests leverage mocks to override DynamoDB

@natezpanther natezpanther requested review from a team January 6, 2023 18:45
@calkim-panther
Copy link
Contributor

I like the idea of this detection. Using just the mean over a rolling 30-minute may be too FP prone.
I might suggest a slightly different approach - With each run we save the count, hour of day (1-24), and day of week (1-7). We keep a ledger of 30 day period (5 minute blocks == 8640 ledger size). Then we can compare the mean within blocks of the same hour and day to account for differences in volume throughout the day/week.

@natezpanther
Copy link
Contributor Author

@calkim-panther I have altered the code to do two things:

  • Expand the rolling_ledger to an hour
  • Track historical maximum counts

Then we account for the average of the rolling_ledger (as before) and also test for a new historical maximum. My hope is that this reduces false positives.

The reason why I do not do relative date comparisons (this Wednesday vs. last Wednesday, for example) is that I don't want to assume there will be predictable trends based on days of the week. I wanted something without that kind of dependency. Thoughts?

@calkim-panther
Copy link
Contributor

An issue with the historical max is that it is an all-time max. If there were a massive dos attack that recorded a high max, subsequent attacks with a lower max count would not be detected. Local maximums would be better but I would still be concerned that a rolling mean would be FP prone due to natural periods of high/low activity and legitimate bursts. We would need a ledger of weeks at minimum to have better accuracy and not rely on purely mean but a deviation.

@natezpanther
Copy link
Contributor Author

natezpanther commented Jan 11, 2023

@calkim-panther Your point about subsequent alerts did occur to me on the historical max values, and so the maximum counts are committed to alert_context() and then purged from DynamoDB after the alert is generated. The rolling ledger never gets purged (and can be expanded by the user if they want to using the global constant).

The new alert condition is:

  • num_logs > 10x the average (or whatever threshold is set in the global constant)
    AND
  • a new maximum value occurs (max values purged after each new alert)

As a simple example, let's say:
count_ledger['rolling_ledger'] = [10, 20, 40, 20, 100, 20]

So the count_ledger average = 35

And:

count_ledger['highest_counts'] = {
    'XX-XX-XXXX timestamp': 10,
    'XX-XX-XXXX timestamp': 20,
    'XX-XX-XXXX timestamp': 40,
    'XX-XX-XXXX timestamp': 100
}

Then: num_logs = 1000

So 1000 is > 10x the average AND it is a new maximum value. This condition would raise an alert. After the alert,

count_ledger = {
    'rolling_ledger': [10, 20, 40, 20, 100, 20, 1000],
    'highest_counts': {}
}

The next run, num_logs = 1200

At that point, count_ledger['rolling_ledger'] = [10, 20, 40, 20, 100, 20, 1000]
The new count_ledger average = 172.86

1200/172 < 10, so no alert. But rolling_ledger will continue to roll, the average will change, and we will account for any new maximums.

Of course no DoS detection can be absolutely reliable. But I do not think this detection will result in as many FPs as it might seem on the surface. I would love to test this using some real data, since I believe the threshold values and rolling_ledger size can be tweaked for better results by default.

@calkim-panther
Copy link
Contributor

Let's merge this as disabled first and we'll enable it in our env and monitor performance.

@natezpanther
Copy link
Contributor Author

@calkim-panther I've disabled the detection. Should I also disable the scheduled query?

@natezpanther natezpanther merged commit 2f1e460 into master Jan 25, 2023
@natezpanther natezpanther deleted the npz-mitre-TA0040-T1499 branch January 25, 2023 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants