Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DDOS of Office.com #4872

Merged
merged 2 commits into from
Jan 21, 2021
Merged

Fix DDOS of Office.com #4872

merged 2 commits into from
Jan 21, 2021

Conversation

vladsud
Copy link
Contributor

@vladsud vladsud commented Jan 21, 2021

We see a ton of blob request to SPO and SPO starts throttling and eventually they throttle whole app, so Office.com is going down.
We do see requests for blobs (and failures) as far history of Kusto allows, but volume increased last 3 days and we are getting a lot of 429s

Copy link
Contributor

@wes-carlson wes-carlson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@msfluid-bot
Copy link
Collaborator

Could not find a usable baseline build with search starting at CI a252465

Generated by 🚫 dangerJS against c7708b5

@vladsud
Copy link
Contributor Author

vladsud commented Jan 21, 2021

We see a ton of blob request to SPO and SPO starts throttling and eventually they throttle whole app, so Office.com is going down.
We do see requests for blobs (and failures) as far history of Kusto allows, but volume increased last 3 days and we are getting a lot of 429s
Being able to debug one case, it's obvious that cache is empty for summarizing client and we start to fetch all blobs from storage, which brings things down.
Disabling cache eviction to work around problem

@vladsud vladsud merged commit 7409d79 into microsoft:release/0.32 Jan 21, 2021
wes-carlson pushed a commit to wes-carlson/FluidFramework that referenced this pull request Jan 22, 2021
We see a ton of blob request to SPO and SPO starts throttling and eventually they throttle whole app, so Office.com is going down.
We do see requests for blobs (and failures) as far history of Kusto allows, but volume increased last 3 days and we are getting a lot of 429s
Being able to debug one case, it's obvious that cache is empty for summarizing client and we start to fetch all blobs from storage, which brings things down.

Disabling cache eviction to work around problem
wes-carlson added a commit that referenced this pull request Jan 22, 2021
We see a ton of blob request to SPO and SPO starts throttling and eventually they throttle whole app, so Office.com is going down.
We do see requests for blobs (and failures) as far history of Kusto allows, but volume increased last 3 days and we are getting a lot of 429s
Being able to debug one case, it's obvious that cache is empty for summarizing client and we start to fetch all blobs from storage, which brings things down.

Disabling cache eviction to work around problem

Co-authored-by: Vlad Sudzilouski <vlad@sudzilouski.com>
wes-carlson pushed a commit to wes-carlson/FluidFramework that referenced this pull request Jan 22, 2021
We see a ton of blob request to SPO and SPO starts throttling and eventually they throttle whole app, so Office.com is going down.
We do see requests for blobs (and failures) as far history of Kusto allows, but volume increased last 3 days and we are getting a lot of 429s
Being able to debug one case, it's obvious that cache is empty for summarizing client and we start to fetch all blobs from storage, which brings things down.

Disabling cache eviction to work around problem
vladsud added a commit to vladsud/FluidFramework that referenced this pull request Jan 22, 2021
We see a ton of blob request to SPO and SPO starts throttling and eventually they throttle whole app, so Office.com is going down.
We do see requests for blobs (and failures) as far history of Kusto allows, but volume increased last 3 days and we are getting a lot of 429s
Being able to debug one case, it's obvious that cache is empty for summarizing client and we start to fetch all blobs from storage, which brings things down.

Disabling cache eviction to work around problem
@vladsud vladsud deleted the DDOS_32 branch January 22, 2021 19:22
wes-carlson added a commit that referenced this pull request Jan 22, 2021
We see a ton of blob request to SPO and SPO starts throttling and eventually they throttle whole app, so Office.com is going down.
We do see requests for blobs (and failures) as far history of Kusto allows, but volume increased last 3 days and we are getting a lot of 429s
Being able to debug one case, it's obvious that cache is empty for summarizing client and we start to fetch all blobs from storage, which brings things down.

Disabling cache eviction to work around problem

Co-authored-by: Vlad Sudzilouski <vlad@sudzilouski.com>
vladsud added a commit that referenced this pull request Feb 1, 2021
Follow up to #4872

Due to evicting blobs from cache, we force summarizer to read most of the blobs from storage without hitting driver cache
As result, SPO starts to throttle requests and app, i.e. it results in self-made DDOS attack.

Reworking cache layer and disabling cache eviction for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants