-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Fix TestResults SQL query to not take as long #880
Conversation
this commit replaces the previous django orm code with an invocation to do a raw sql query that performs much better for repos with a large amount of data. one worry with this approach is the risk for sql injection in this specific case, we're passing most of the dynamic user provided values used in the sql through the parameters which the django docs say are sanitized and should be used to protect against sql injection for the user provided values that are not passed through the sql parameters they're checked to be in a strict set of values, so we shouldn't be substituting any unexpected strings into the query
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #880 +/- ##
==========================================
- Coverage 96.31% 96.28% -0.03%
==========================================
Files 823 823
Lines 19079 19130 +51
==========================================
+ Hits 18376 18420 +44
- Misses 703 710 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chiming in on the SQL injection risk: currently looks good. {order_by}
and {order}
are computed based on validated data, and every other string substitution controls the presence/omission of a hardcoded clause, nothing dynamic. the hardcoded clauses use query parameters and any dynamic values are provided separately
https://docs.djangoproject.com/en/5.1/topics/db/sql/#passing-parameters-into-raw it looks like you can name your query parameters like %(repoid)s
and then make your params
list a dict instead where the key is the parameter name. that would probably make this query a lot easier to work with
i imagine Django chose to use Python's % string interpolation syntax for their parameterized query syntax to make switching to safe queries easier, but man i wish the safe code didn't look so much like the dangerous code
do you have a sample of the SQL that was generated with the Django ORM query? i am not a django-optimizing expert but dynamically generating a query like this is fragile enough that it's worth a second look to see if we can't improve the performance in place
this is a really good idea i will implement this
the issue i ran into with Django was that doing arbitrary joins and using derived tables seems to be impossible, and i had been messing around with the SQL for a while trying to avoid using self joins and derived tables but I could not find a performant way to query this information without doing so. If we can come up with SQL that doesn't use derived tables and is performant then I'd be willing to try using Django again but until then I think it's impossible. |
utils/test_results.py
Outdated
params: dict[str, int | str | tuple[str, ...]] | ||
|
||
|
||
def convert_tuple_or_none(value: set[str] | list[str] | None) -> tuple[str, ...] | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert_to_tuple_else_none maybe?
utils/test_results.py
Outdated
return tuple(value) if value else None | ||
|
||
|
||
def encode_after_or_before(value: str | None) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is actually just encode_to_b64 isnt it? Nothing in particular to after or before
utils/test_results.py
Outdated
|
||
term_filter = f"%{term}%" if term else None | ||
|
||
if interval_num_days not in {1, 7, 30}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Im kinda going back and forth with moving all this validation logic into a separate "validate_params" fn or similar to kinda clean up this function a bit
utils/test_results.py
Outdated
select test_id, commits_where_fail as cwf | ||
from base_cte | ||
where array_length(commits_where_fail,1) > 0 | ||
) as foo, unnest(cwf) as unnested_cwf group by test_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foo
|
||
filtered_test_ids = set([test["test_id"] for test in totals]) | ||
|
||
test_ids = test_ids & filtered_test_ids if test_ids else filtered_test_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this intersection is quicker than doing a filter on term prior to flags?
fail_count_sum=Sum("fail_count"), | ||
pass_count_sum=Sum("pass_count"), | ||
) | ||
.filter(skip_count_sum__gt=0, fail_count_sum=0, pass_count_sum=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are your thoughts on not having the fail_count_sum check here --- signaling tests which were constantly failing that were then skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it makes sense to have tests that have never passed here, but there's no way of knowing if it's like interleaved in the way that its failing and skipped, i'm cool with trying it though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more relatable personally
utils/test_results.py
Outdated
.annotate(v=Distinct(Unnest(F("commits_where_fail")))) | ||
.values("v") | ||
if not first and not last: | ||
first = 25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 20 is our default typically
we change the query to only do ordering and get the entire set of tests then we do the filtering in the application using a binary search, because the result of the query is ordered by the ordering parameter then the name and the binary search takes that into account. The reason we have to do the filtering in the application is so we can get the correct value for the totalCount. Otherwise we would have one query complete the logic for getting totalCount and another query actually doing the cursor filtering.
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard |
Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time. ❌ Failed Test Results:Completed 2707 tests with View the full list of failed testspytest
|
@@ -234,7 +234,7 @@ def test_fetch_test_result_last_duration(self) -> None: | |||
result["owner"]["repository"]["testAnalytics"]["testResults"]["edges"][0][ | |||
"node" | |||
]["lastDuration"] | |||
== 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why this guy changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we stopped automatically setting the lastDuration
value to 0 from when we stopped computing it because we thought it was singlehandedly ruining the perf of the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhhh okay yeah i remember
term_filter = f"%{term}%" if term else None | ||
|
||
if should_reverse: | ||
ordering_direction = "DESC" if ordering_direction == "ASC" else "ASC" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, we want to reverse the ordering direction when we get last
passed to the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, yeah that makes sense
return (row_value_str > cursor_value_str) - (row_value_str < cursor_value_str) | ||
|
||
left, right = 0, len(rows) - 1 | ||
while left <= right: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holy smokes is this binary search in the wild?
utils/test_results.py
Outdated
|
||
left, right = 0, len(rows) - 1 | ||
while left <= right: | ||
print(left, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return TestResultsQuery(query=base_query, params=filtered_params) | ||
|
||
|
||
def search_base_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing all this? A function description would go a long way for this guy
utils/test_results.py
Outdated
@@ -76,131 +298,152 @@ def generate_test_results( | |||
:param branch: optional name of the branch we want to filter on, if this is provided the aggregates calculated will only take into account | |||
test instances generated on that branch. By default branches will not be filtered and test instances on all branches wil be taken into | |||
account. | |||
:param history: timedelta for filtering test instances used to calculated the aggregates by time, the test instances used will be | |||
those with a created at larger than now - history. | |||
:param interval: timedelta for filtering test instances used to calculated the aggregates by time, the test instances used will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used to calculate*
utils/test_results.py
Outdated
:param history: timedelta for filtering test instances used to calculated the aggregates by time, the test instances used will be | ||
those with a created at larger than now - history. | ||
:param interval: timedelta for filtering test instances used to calculated the aggregates by time, the test instances used will be | ||
those with a created at larger than now - interval. | ||
:param testsuites: optional list of testsuite names to filter by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test suites filtering is also union right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed i will add that as a comment
|
||
rows = [TestResultsRow(*row) for row in aggregation_of_test_results] | ||
|
||
page_size: int = first or last or 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point first or last will always have a value thanks to your check on 399 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the typing is broken unless i do this, but you are correct
this commit replaces the previous django orm code with an invocation
to do a raw sql query that performs much better for repos with a large
amount of data.
one worry with this approach is the risk for sql injection
in this specific case, we're passing most of the dynamic user provided
values used in the sql through the parameters which the django docs say
are sanitized and should be used to protect against sql injection
for the user provided values that are not passed through the sql parameters
they're checked to be in a strict set of values, so we shouldn't be substituting
any unexpected strings into the query