Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pre risky migration flaky test changes #524

Merged
merged 6 commits into from
Jul 2, 2024

Conversation

joseph-sentry
Copy link
Contributor

This commit makes the changes possible before the risky migrations in the supporting shared PR are done.

  • update shared version
  • update test results parser version
  • add time-machine as a dependency
  • add Flake and ReducedError sqlalchemy models
  • modify TestResultsNotificationPayload to contain a set of flaky test ids instead of a dict[str, TestResultsNotificationFlake]
  • change flaky test results comment format
  • change flake detection in test results finisher to gather the Flake objects for a given repo and compare their test ids to the test ids of the failures relevant to the test results comment

Copy link

codecov bot commented Jun 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.51%. Comparing base (c3bcddf) to head (6b2507c).

✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #524   +/-   ##
=======================================
  Coverage   97.50%   97.51%           
=======================================
  Files         449      449           
  Lines       35739    35731    -8     
=======================================
- Hits        34848    34843    -5     
+ Misses        891      888    -3     
Flag Coverage Δ
integration 97.49% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 97.49% <100.00%> (+<0.01%) ⬆️
unit 97.49% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.60% <100.00%> (+0.01%) ⬆️
OutsideTasks 97.74% <100.00%> (-0.01%) ⬇️
Files Coverage Δ
database/models/reports.py 99.45% <100.00%> (+0.05%) ⬆️
services/test_results.py 91.71% <100.00%> (-0.18%) ⬇️
services/tests/test_test_results.py 100.00% <100.00%> (ø)
tasks/test_results_finisher.py 97.69% <100.00%> (+1.69%) ⬆️
tasks/tests/unit/test_test_results_finisher.py 100.00% <100.00%> (ø)
...sks/tests/unit/test_test_results_processor_task.py 100.00% <ø> (ø)

... and 1 file with indirect coverage changes

This change has been scanned for critical changes. Learn more

@codecov-notifications
Copy link

codecov-notifications bot commented Jun 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #524   +/-   ##
=======================================
  Coverage   97.48%   97.49%           
=======================================
  Files         418      418           
  Lines       35016    35008    -8     
=======================================
- Hits        34135    34130    -5     
+ Misses        881      878    -3     
Flag Coverage Δ
integration 97.49% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 97.49% <100.00%> (+<0.01%) ⬆️
unit 97.49% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.55% <100.00%> (+0.01%) ⬆️
OutsideTasks 97.74% <100.00%> (-0.01%) ⬇️
Files Coverage Δ
database/models/reports.py 99.45% <100.00%> (+0.05%) ⬆️
services/test_results.py 90.53% <100.00%> (-0.28%) ⬇️
services/tests/test_test_results.py 100.00% <100.00%> (ø)
tasks/test_results_finisher.py 96.92% <100.00%> (+1.58%) ⬆️
tasks/tests/unit/test_test_results_finisher.py 100.00% <100.00%> (ø)
...sks/tests/unit/test_test_results_processor_task.py 100.00% <ø> (ø)

... and 1 file with indirect coverage changes

@codecov-qa
Copy link

codecov-qa bot commented Jun 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.49%. Comparing base (c3bcddf) to head (6b2507c).

✅ All tests successful. No failed tests found.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #524   +/-   ##
=======================================
  Coverage   97.48%   97.49%           
=======================================
  Files         418      418           
  Lines       35016    35008    -8     
=======================================
- Hits        34135    34130    -5     
+ Misses        881      878    -3     
Flag Coverage Δ
integration 97.49% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 97.49% <100.00%> (+<0.01%) ⬆️
unit 97.49% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.55% <100.00%> (+0.01%) ⬆️
OutsideTasks 97.74% <100.00%> (-0.01%) ⬇️
Files Coverage Δ
database/models/reports.py 99.45% <100.00%> (+0.05%) ⬆️
services/test_results.py 90.53% <100.00%> (-0.28%) ⬇️
services/tests/test_test_results.py 100.00% <100.00%> (ø)
tasks/test_results_finisher.py 96.92% <100.00%> (+1.58%) ⬆️
tasks/tests/unit/test_test_results_finisher.py 100.00% <100.00%> (ø)
...sks/tests/unit/test_test_results_processor_task.py 100.00% <ø> (ø)

... and 1 file with indirect coverage changes

Copy link

codecov-public-qa bot commented Jun 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.49%. Comparing base (c3bcddf) to head (6b2507c).

✅ All tests successful. No failed tests found ☺️

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #524   +/-   ##
=======================================
  Coverage   97.48%   97.49%           
=======================================
  Files         418      418           
  Lines       35016    35008    -8     
=======================================
- Hits        34135    34130    -5     
+ Misses        881      878    -3     
Flag Coverage Δ
integration 97.49% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 97.49% <100.00%> (+<0.01%) ⬆️
unit 97.49% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.55% <100.00%> (+0.01%) ⬆️
OutsideTasks 97.74% <100.00%> (-0.01%) ⬇️
Files Coverage Δ
database/models/reports.py 99.45% <100.00%> (+0.05%) ⬆️
services/test_results.py 90.53% <100.00%> (-0.28%) ⬇️
services/tests/test_test_results.py 100.00% <100.00%> (ø)
tasks/test_results_finisher.py 96.92% <100.00%> (+1.58%) ⬆️
tasks/tests/unit/test_test_results_finisher.py 100.00% <100.00%> (ø)
...sks/tests/unit/test_test_results_processor_task.py 100.00% <ø> (ø)

... and 1 file with indirect coverage changes

message = Column(types.Text)


class Flake(CodecovBaseModel, MixinBaseClass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to then replace these w/ their django counterparts after you run the risky migration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, these will remain, these changes just don't depend on the risky migrations to run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm is there a reason we don't want to use the django models instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's possible the Flake model is not needed, but the ReducedError model is needed because we will be writing the reduced_error_id in the test results processor in future changes

Copy link
Contributor

@adrian-codecov adrian-codecov Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking, we'll defer the use of django models as it poses some difficulties with testing atm. Happy to approve once you have addressed @michelletran-codecov's feedback 👌

Copy link
Contributor

@michelletran-codecov michelletran-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM modulo comment below and adding SQLAlchemy models. The current query to get flakes feels pretty isolated. I'm wondering if there's a way for us to use the Django models instead. If there are no writes in this task, then there will unlikely be transaction contention. Also, the current query is pretty isolated, but I'm guessing that we'll need to tie it with TestInstance at some point? If that's the only place where we're interacting with the existing SQLAlchemy models, then would it be doable with referencing the ids?

Of course, keeping two different database models and connections for this task is also adding unnecessary complexity, so I'm not opposed to also just adding SQLAlchemy models.

reason=reason,
),
)
def get_flaky_tests(self, db_session, commit_yaml, repoid, commit, failures):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some type annotations?

@joseph-sentry
Copy link
Contributor Author

I'm having a lot of trouble getting the tests to work with both the Django ORM and sqlalchemy, i'm trying to get them to both connect to the same db but there's complications because django wants to create it's own test database and sqlalchemy tries creating one as well. I don't think the complexity is worth the benefits.

@joseph-sentry joseph-sentry force-pushed the joseph/flakes-pre-risky branch 3 times, most recently from 846827a to 1326fb1 Compare June 28, 2024 15:25
Copy link
Contributor

@michelletran-codecov michelletran-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments about query performance.

Flake.repoid == repoid,
Flake.end_date.is_(None),
)
.all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... so usually to make the query more predictable, I would suggest to add a limit to the returned results (will help us with IO and memory usage in app). I see that you use this mainly to 1. count and 2. retrieve flakes from failed tests.

What do you think about splitting those into 2 separate queries? One to do the count (can use the queryset count method: https://docs.djangoproject.com/en/5.0/ref/models/querysets/#django.db.models.query.QuerySet.count to do select count(*) ...) and another to query flaky test from the failed test ids. This will involve 2 DB calls rather than one, but hopefully will be relatively fast and be a less data processing in the app itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we could limit and offset this query to reduce memory usage but maybe it'd be better to filter on test ids in the query like i mentioned in my comment below, and also only select the test id from the query which is all we're looking for here. Is there another reason other than memory usage that we would limit this query?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only select the test id from the query which is all we're looking for here.

I'm good with providing a id list. We will probably also want to ensure that the list isn't too long (i.e. we can cap this on the application side). It's probably fine to query all the tests for now. However, we will want to keep an eye on the performance of this query (correlated with number of failed tests it's trying to retrieve).

Is there another reason other than memory usage that we would limit this query?

Yes. Having a bound on the number of items returned will also make the processing of the results more predictable. For example, we don't have to worry (as much) about long running tasks because it's going to only ever process i.e. 30 items.

Comment on lines 309 to 334
Flake.repoid == repoid,
Flake.end_date.is_(None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the index is added on a compound of ["repository", "test", "reduced_error", and "date"]. This means that for this query, it's only going to use the index for "repository" (because Postgres processes index from left to right).

I believe we can also add an index for null fields specifically to make it more efficient, but I don't have as much experience with this. :p If we want to make this query efficient, we might want to explore doing a compound index with (repository, enddate = null) or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would adding on a Flake.testid.in_(list_of_test_ids) make this more efficient then? I plan to to eventually add the reduced_error_id to the query as well, so in the end we will be making full use of this index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to to eventually add the reduced_error_id to the query as well, so in the end we will be making full use of this index.

Ah OK! This is fine then. 👍

This commit makes the changes possible before the risky migrations in
the supporting shared PR are done.

- update shared version
- update test results parser version
- add time-machine as a dependency
- add Flake and ReducedError sqlalchemy models
- modify TestResultsNotificationPayload to contain a set of flaky test
  ids instead of a dict[str, TestResultsNotificationFlake]
- change flaky test results comment format
- change flake detection in test results finisher to gather the Flake
  objects for a given repo and compare their test ids to the test ids
  of the failures relevant to the test results comment
Signed-off-by: joseph-sentry <joseph.sawaya@sentry.io>
Signed-off-by: joseph-sentry <joseph.sawaya@sentry.io>
Signed-off-by: joseph-sentry <joseph.sawaya@sentry.io>
Copy link
Contributor

@michelletran-codecov michelletran-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@joseph-sentry joseph-sentry added this pull request to the merge queue Jul 2, 2024
Merged via the queue into main with commit b689d22 Jul 2, 2024
25 of 30 checks passed
@joseph-sentry joseph-sentry deleted the joseph/flakes-pre-risky branch July 2, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants