Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add graphql max depth and aliases limits #955

Merged
merged 10 commits into from
Nov 6, 2024

Conversation

suejung-sentry
Copy link
Contributor

@suejung-sentry suejung-sentry commented Nov 1, 2024

Add protections against Denial of Service attacks on the GraphQL API that use high depth, high breadth, or high aliases.

  • For high depth attacks (highly nested GraphQL queries) - I added a new validation_rule to ariadne that rejects past a max setting. There were no off-the-shelf with ariadne so this is a custom one created mimicking the one from the Apollo GraphQL SDK linked by the pentesters.
  • For high breadth attacks (a ton of fields request for a given level) - we don't really offer in our schema things that wide, so the protection against aliases should cover this issue
  • For high aliases attacks (requesting the same field over and over using aliases) - I added a new validation_rule counting these and rejecting if beyond a max setting. See same note on high depth attacks.

Note that we also already have cost validation by ariadne. The additional rules in this PR can catch cases where the cost validation is correct against our day-to-day use cases, but may be too lenient against crafted "attacks", such as the ones composed by the pentesters.

We also already have rate limiting on the GraphQL endpoint so that pentest recommendation is already covered.

Closes https://github.com/codecov/internal-issues/issues/918
Closes https://github.com/codecov/internal-issues/issues/917

@codecov-notifications
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Copy link

codecov bot commented Nov 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.25%. Comparing base (0273319) to head (903ed78).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #955   +/-   ##
=======================================
  Coverage   96.25%   96.25%           
=======================================
  Files         826      827    +1     
  Lines       19048    19090   +42     
=======================================
+ Hits        18334    18376   +42     
  Misses        714      714           
Flag Coverage Δ
unit 92.52% <100.00%> (+0.01%) ⬆️
unit-latest-uploader 92.52% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@suejung-sentry suejung-sentry marked this pull request as ready for review November 1, 2024 22:51
@suejung-sentry suejung-sentry requested a review from a team as a code owner November 1, 2024 22:51
self.max_depth_reached: bool = False
self.max_depth: int = max_depth

def enter_operation_definition(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, are these functions base functions you had to override default behavior of?

Similar story with enter_field, leave_field, and enter_document

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Though I don't think about it so much as "override" as "implement the interface" that is defined here. The function names enter_X and leave_X are dynamic per here. And for any method that's not implemented explicitly, it behaves as a no-op (here).

this is the stuff I looked at in Ariadne doc & this example implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Thanks for linking all those docs, that was a fun set of reads

@@ -201,7 +202,11 @@ def get_validation_rules(
maximum_cost=settings.GRAPHQL_QUERY_COST_THRESHOLD,
default_cost=1,
variables=data.get("variables"),
)
),
create_max_depth_rule(max_depth=getattr(settings, "GRAPHQL_MAX_DEPTH", 15)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we need to do the getattr() here if the value will always be set in the settings_base file?

Nitpicking since the rest of the codebase is pulling off of settings directly 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my intention was to force defensive programming there as I feel like I've seen a lot of accessing of dicts and tribal knowledge on when it's a bug.
But thinking about it again, it does seem like burying a default at this level is the wrong practice and since settings is something we control, we would prefer it to fail early and actually throw an error instead of having it "be forgiving" here.
Fixed it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yep totally agreed! Thanks for making the update

@@ -106,6 +106,10 @@

GRAPHQL_INTROSPECTION_ENABLED = False

GRAPHQL_MAX_DEPTH = 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to hear a little more about landing on 15 for both of these values, do you think stage/production/dev should have the same value for each?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed the existing queries and the max depth was around ~10, so I thought it would be a good buffer. Could potentially do like 20 if we want more wiggle room. I think keeping it the same in all envs makes sense since honestly I think it's just a set-and-forget kind of thing. Also, we are the main consumers of the graphql api so I'd want to catch any queries that would error in prod to be caught first in the lower envs at the same settings.

For aliases, I don't think we ever use more than a handful (<5?) at a time, so thought a reasonable use case wouldn't go beyond say 15 anyway (and anything above that seems to be a malicious actor).

Open to thoughts on those. If any become a problem, it's as simple as bumping this up & we would catch the issue during development of some new feature anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally makes sense! I agree with you having it ~15 for the depth makes a lot of sense. Could probably lower the alias one if we really wanted, though not a deal breaker

Copy link
Contributor

@ajay-sentry ajay-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@suejung-sentry
Copy link
Contributor Author

Some other things tested -
smoke tested around staging app that no pages had any regression as a result of these rules
tested in staging a query with many aliases above and below the threshold and behaved as expected
will track any latency delta in our request latency histogram

@suejung-sentry suejung-sentry added this pull request to the merge queue Nov 6, 2024
Merged via the queue into main with commit 63124e2 Nov 6, 2024
30 of 32 checks passed
@suejung-sentry suejung-sentry deleted the sshin/fix/max-depth branch November 6, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants