Improve least_duration algorithm by sorting durations #28

mbkroese · 2021-06-24T17:21:03Z

Updates the least_duration algorithm to create higher quality splits by sorting by test durations first.
This algorithm is less likely to be affected by #25

mbkroese · 2021-06-24T17:26:24Z

@alexforencich cc

mbkroese · 2021-06-24T17:28:12Z

@jerry-git could you please review? This algorithm is useful anyway because it creates better splits plus it gives users an option that is not affected by issue 25.

alexforencich · 2021-06-24T21:21:18Z

How about instead of adding yet another algorithm, instead roll this functionality into least_duration, but also restore the original order within the groups? You can easily do this by assigning an index to each of the items before sorting, then after splitting, sort each group by this index.

mbkroese · 2021-06-24T21:25:01Z

You can easily do this by assigning an index to each of the items before sorting,

Good suggestion, will change it.

By sorting the tests by their duration, we can deal with the test with largest duration first and balance out the groups in later assignments. At the end we sort each selected group by their original items order to maintain relative ordering.

mbkroese · 2021-06-25T07:13:12Z

This is ready for review now.

alexforencich · 2021-06-25T07:18:36Z

src/pytest_split/algorithms.py

+    The algorithm sorts the items by their duration. Since the sorting algorithm is stable, ties will be broken by
+    maintaining the original order of items. It is therefore important that the order of items be identical on all nodes
+    that use this plugin. Due to issue #25 this might not always be the case.


Sorting by duration should make it less sensitive to the test order so long as the durations are unique, no? Seems to me like sorting by duration instead of not sorting at all would fix #25. Alternatively, what about sorting by the test name, then the duration to resolve any possible ties?

Sorting by duration should make it less sensitive to the test order

Indeed, sorting makes it less sensitive to this problem. Only the collected items that have the same duration will now need to be collected in the right order. However, ties are likely to happen because missing durations get filled with the same average duration.

Alternatively, what about sorting by the test name, then the duration to resolve any possible ties?

I think we should discuss this further in that issue, and we can adapt this algorithm again later once we agree on a solution. As I mention in that issue, sorting by name would resolve the issue flagged there, but I think collection of tests parametrised with objects would still be problematic.

mbkroese · 2021-07-21T10:59:26Z

@jerry-git just a reminder for this one now that you're back from holiday :)

jerry-git · 2021-07-21T11:00:08Z

yeah thanks, will have a look 🙂

jerry-git

LGTM 👍

jerry-git · 2021-07-21T11:29:00Z

available in 0.3.2

mbk added 3 commits June 25, 2021 08:27

Small refactor to reuse common logic

80e690b

Added test to show that algo maintains relative order of tests

3236751

alexforencich reviewed Jun 25, 2021

View reviewed changes

mbkroese changed the title ~~Add sorted_least_duration algorithm~~ Improve least_duration algorithm by sorting durations Jun 25, 2021

jerry-git approved these changes Jul 21, 2021

View reviewed changes

jerry-git merged commit 32b73ea into jerry-git:master Jul 21, 2021

jerry-git mentioned this pull request Nov 30, 2022

[Actions] Auto-Update cookiecutter template #64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve least_duration algorithm by sorting durations #28

Improve least_duration algorithm by sorting durations #28

Uh oh!

mbkroese commented Jun 24, 2021 •

edited

Loading

Uh oh!

mbkroese commented Jun 24, 2021

Uh oh!

mbkroese commented Jun 24, 2021

Uh oh!

alexforencich commented Jun 24, 2021 •

edited

Loading

Uh oh!

mbkroese commented Jun 24, 2021 •

edited

Loading

Uh oh!

mbkroese commented Jun 25, 2021

Uh oh!

alexforencich Jun 25, 2021

Uh oh!

mbkroese Jun 25, 2021 •

edited

Loading

Uh oh!

mbkroese commented Jul 21, 2021

Uh oh!

jerry-git commented Jul 21, 2021

Uh oh!

jerry-git left a comment

Uh oh!

jerry-git commented Jul 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve least_duration algorithm by sorting durations #28

Improve least_duration algorithm by sorting durations #28

Uh oh!

Conversation

mbkroese commented Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbkroese commented Jun 24, 2021

Uh oh!

mbkroese commented Jun 24, 2021

Uh oh!

alexforencich commented Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbkroese commented Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbkroese commented Jun 25, 2021

Uh oh!

alexforencich Jun 25, 2021

Choose a reason for hiding this comment

Uh oh!

mbkroese Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbkroese commented Jul 21, 2021

Uh oh!

jerry-git commented Jul 21, 2021

Uh oh!

jerry-git left a comment

Choose a reason for hiding this comment

Uh oh!

jerry-git commented Jul 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mbkroese commented Jun 24, 2021 •

edited

Loading

alexforencich commented Jun 24, 2021 •

edited

Loading

mbkroese commented Jun 24, 2021 •

edited

Loading

mbkroese Jun 25, 2021 •

edited

Loading