Volatility introduced in tests since approx September 18th - potentially package sync #446

ncclementi · 2022-10-13T17:08:50Z

Some tests that were decently stable have become very volatile around mid september. This makes it very hard to trust the regression detection system we have in place. But there is a pattern:

q3 [5GB parquet] - upstream

q5 [5GB parquet] - upstream - Hard to see but the red line became more volatile around the same time

q7 [5GB parquet] - upstream

q8 [5GB parquet] - upstream

test_dataframe_align - volatile and definitely a regression

test_shuffle - became more volatile

The interesting thing is that this volatility also happens on latest (2022.6.1) which makes me think this issue was introduced with something that happened in package sync (PR merged 09/15)

q3 [5GB parquet] latest

q7 [5GB parquet] latest

test_dataframe_align

cc: @ian-r-rose @jrbourbeau @shughes-uk

ncclementi · 2022-10-17T21:38:57Z

@fjetter and @ntabris for visibility

fjetter · 2022-10-18T08:24:22Z

Comparing conda environments between pre and post merge, the most interesting diff is coiled itself which jumped from 0.2.27 to 0.2.30 but I only compared two runs. I'd be interested to see a comparison of more commits, particularly the spikes.
Do we have an easy way checking what versions where installed on all the runs? Is this something we should consider adding to the benchmark database, e.g. output of client.get_versions?

ntabris · 2022-10-18T13:18:57Z

Do we have an easy way checking what versions where installed on all the runs? Is this something we should consider adding to the benchmark database, e.g. output of client.get_versions?

Two thoughts.

I'm interesting in having better tagging/tracking on the platform side, rather than a thing that's specifically for coiled-runtime tests.
I'll take a quick look and see if I can get some data about version changes around Sept 19.

ntabris · 2022-10-18T17:37:35Z

It's very hard to find data for old tests. So here's what I can get without ridiculous amounts of work.

Here's a test_array cluster from after volatility increased:

https://cloud.coiled.io/dask-engineering/clusters/83375/details

That cluster has py3.9 and dask/distributed 2022.6.0. You can follow that link to see other software versions.

Here's cluster from before that has py3.9 and dask/distributed 2022.6.0:

https://cloud.coiled.io/dask-engineering/clusters/75547/details

I see changes in pandas and pyarrow. Would that be relevant? Anyone else is welcome to dig in more comparing those clusters, I don't have much more insight.

shughes-uk · 2022-10-18T20:20:26Z

I've identified a couple of issues with package sync, in both detecting packages on cluster and client side. Staging has a ton of bugfixes i'm eager to get out but it's held up currently by ensuring we've tested some of the infra work.

ntabris · 2022-10-18T20:21:23Z

I've identified a couple of issues with package sync, in both detecting packages on cluster and client side. Staging has a ton of bugfixes i'm eager to get out but it's held up currently by ensuring we've tested some of the infra work.

Any reason to think package sync bugs are causing this?

shughes-uk · 2022-10-18T21:05:18Z

I suspect it might be causing release versions to be installed instead of main branch. If this is happening even on legacy tests, then it's unlikely.

ntabris · 2022-10-18T21:14:29Z

I suspect it might be causing release versions to be installed instead of main branch. If this is happening even on legacy tests, then it's unlikely.

Yeah, you can see the change on clusters running coiled-runtime=0.1.0: https://benchmarks.coiled.io/coiled-0.1.0-py3.9.html

The examples I listed #446 (comment) are runtime 0.1.0 w/ py39.

ncclementi · 2022-10-19T15:06:06Z

@shughes-uk Should we have an "environment" in the runtime where we test against staging?
It would dask/distributed upstream + coiled staging. Or is this something y'all have covered on your end?

shughes-uk · 2022-10-19T16:10:46Z

If we can identify a subsection of the tests for this it might be worthwhile, otherwise the cost might be prohibitive

ncclementi · 2022-10-31T16:58:55Z

It seems most of the volatility has calm down, except for the case of test_dataframe_align. I would be inclined to close this issue and talk about dataframe align in a separate one, thoughts?

q3 [5GB parquet] - upstream

q5 [5GB parquet] - upstream

q7 [5GB parquet] - upstream

q8 [5GB parquet - upstream

test_dataframe_align - upstream (still crazy jumps)

test_shuffle - upstream

q3 [5GB parquet] latest

q7 [5GB parquet] latest

test_dataframe_align latest

ntabris · 2022-10-31T17:05:43Z

vorticity also shows non-trivial variability

ncclementi · 2022-10-31T17:08:35Z

vorticity also shows non-trivial variability

Yes, but that has been that way since the beginning, maybe @gjoseph92 knows more about it.

gjoseph92 · 2022-10-31T23:39:51Z

I don't think that's too surprising, I think that one spills a bit? I'd expect it to get better after dask/distributed#7213

ntabris · 2022-11-01T00:01:12Z

I don't think that's too surprising, I think that one spills a bit?

Yep. Spill is also why test_dataframe_align sometimes is quick (when no spill) and sometimes slow (when lots of spill).

gjoseph92 · 2022-11-01T01:04:16Z

This could even be the same thing as test_dataframe_align: if work stealing decides to mess with the co-assignment, that could also affect this test I think

jrbourbeau assigned ncclementi Oct 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volatility introduced in tests since approx September 18th - potentially package sync #446

Volatility introduced in tests since approx September 18th - potentially package sync #446

ncclementi commented Oct 13, 2022

ncclementi commented Oct 17, 2022

fjetter commented Oct 18, 2022

ntabris commented Oct 18, 2022

ntabris commented Oct 18, 2022

shughes-uk commented Oct 18, 2022

ntabris commented Oct 18, 2022

shughes-uk commented Oct 18, 2022 •

edited

Loading

ntabris commented Oct 18, 2022

ncclementi commented Oct 19, 2022

shughes-uk commented Oct 19, 2022

ncclementi commented Oct 31, 2022

ntabris commented Oct 31, 2022

ncclementi commented Oct 31, 2022

gjoseph92 commented Oct 31, 2022

ntabris commented Nov 1, 2022

gjoseph92 commented Nov 1, 2022

Volatility introduced in tests since approx September 18th - potentially package sync #446

Volatility introduced in tests since approx September 18th - potentially package sync #446

Comments

ncclementi commented Oct 13, 2022

ncclementi commented Oct 17, 2022

fjetter commented Oct 18, 2022

ntabris commented Oct 18, 2022

ntabris commented Oct 18, 2022

shughes-uk commented Oct 18, 2022

ntabris commented Oct 18, 2022

shughes-uk commented Oct 18, 2022 • edited Loading

ntabris commented Oct 18, 2022

ncclementi commented Oct 19, 2022

shughes-uk commented Oct 19, 2022

ncclementi commented Oct 31, 2022

ntabris commented Oct 31, 2022

ncclementi commented Oct 31, 2022

gjoseph92 commented Oct 31, 2022

ntabris commented Nov 1, 2022

gjoseph92 commented Nov 1, 2022

shughes-uk commented Oct 18, 2022 •

edited

Loading