Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: #251 - Update regex library to prevent excessive backtracking #420

Merged
merged 14 commits into from
Mar 1, 2024

Conversation

ajay-sentry
Copy link
Contributor

@ajay-sentry ajay-sentry commented Feb 26, 2024

Purpose/Motivation

  • Aims to fix both places in the Codecov codebase that evaluate a user-supplied regex using Python's re.match.
  • Python's regex implementation allows backtracking
  • Replaces the areas using regex.match for user supplied regexes with google's re2
  • Replaces the areas using regex.match for user supplied regexes with a new regex library

Links to relevant tickets

#251

What does this PR do?

  • re2 is a "drop-in" replacement for Python's re package, so I simply changed the two instances we have user supplied regexes with re2 from re.

  • In order to get the docker image to BUILD though, I had to do some "hacks" which were laid out in Failed build google-re2 on the alpine docker image google/re2#456. Explicitly, they were the following:

    • abseil-cpp-dev
    • py3-pybind11-dev
    • re2-dev
  • Additionally, as called out in the above link, I had to use v1.0 instead of v1.1 of the library to get it to work.

  • UPDATE (2/29/24): Once we got the re2 library working, we realized that there were two "big" issues with the re2 library

    • First, re2 doesn't support forward looking regex (ending wildcards) out the box, which for a few file types in empty_upload.py we had ending wildcards and I didn't want to mess with "removing" functionality that we may have had prior; though it may have been a worthy tradeoff if the benefit is to prevent DDOSers
    • Second, the escape character for re2 is \z vs. \Z which required a little bit of funkiness throughout the codebase requiring "one-off" polyfills today, or a mass conversion to re2. Which seemed a little anti-patterny and like scope creep we didn't need
    • After conversing with @JerrySentry he pointed out another regex library that has a built in timeout feature for pattern matching which seemed like a decent compromise. Plugging in a 5 second timeout for now, I think this solved the "root" issue of these potential long running regex backtracks, while at the same time not being so involved that we needed to change a lot more things than necessary (I'm looking at you docker files!)

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@ajay-sentry ajay-sentry requested a review from a team as a code owner February 26, 2024 23:59
@ajay-sentry ajay-sentry changed the title Fix: 251 - Update regex library to prevent excessive backtracking Fix: #251 - Update regex library to prevent excessive backtracking Feb 27, 2024
@ajay-sentry ajay-sentry requested a review from a team as a code owner February 28, 2024 21:40
@codecov-qa
Copy link

codecov-qa bot commented Feb 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.95%. Comparing base (568c049) to head (b16301b).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #420   +/-   ##
=======================================
  Coverage   95.95%   95.95%           
=======================================
  Files         642      642           
  Lines       16977    16978    +1     
=======================================
+ Hits        16290    16291    +1     
  Misses        687      687           
Flag Coverage Δ
unit 95.95% <100.00%> (+<0.01%) ⬆️
unit-latest-uploader 95.95% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codecov-public-qa bot commented Feb 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (568c049) 95.95% compared to head (b16301b) 95.95%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #420   +/-   ##
=======================================
  Coverage   95.95%   95.95%           
=======================================
  Files         642      642           
  Lines       16977    16978    +1     
=======================================
+ Hits        16290    16291    +1     
  Misses        687      687           
Flag Coverage Δ
unit 95.95% <100.00%> (+<0.01%) ⬆️
unit-latest-uploader 95.95% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
services/profiling.py 100.00% <100.00%> (ø)
upload/views/empty_upload.py 91.17% <100.00%> (+0.13%) ⬆️

Impacted file tree graph

Copy link
Contributor

@JerrySentry JerrySentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

codecov bot commented Mar 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.12%. Comparing base (568c049) to head (f845d1d).

❗ Current head f845d1d differs from pull request most recent head b16301b. Consider uploading reports for the commit b16301b to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##            main    #420     +/-   ##
=======================================
- Coverage   95.67   94.12   -1.55     
=======================================
  Files        764     763      -1     
  Lines      17563   18044    +481     
=======================================
+ Hits       16802   16983    +181     
- Misses       761    1061    +300     
Flag Coverage Δ
unit ?
unit-latest-uploader ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ajay-sentry ajay-sentry merged commit ab757ca into main Mar 1, 2024
18 checks passed
@ajay-sentry ajay-sentry deleted the Ajay/251-vuln branch March 1, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants