-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track CPU and network occupancy separately #7020
Conversation
May end up wanting to move away from the dataclass if we don't like the breaking API change or the many places that will have to change become too onerous
Circular imports from work-stealing
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 19m 55s ⏱️ + 9m 12s For more details on these failures, see this check. Results for commit f3de3b9. ± Comparison against base commit 1fd07f0. ♻️ This comment has been updated with latest results. |
ws.processing[ts] = total_duration | ||
self.total_occupancy += total_duration - old | ||
ws.occupancy += total_duration - old | ||
old = ws.processing.get(ts, Occupancy(0, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We're calling this method a lot. This will always initialize the empty class instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe worth a separate branch for the case that the key isn't there? Or allowing Occupancy
to be incremented and decremented by plain ints?
queued_occupancy = len(self.queued) * self.UNKNOWN_TASK_DURATION | ||
queued_occupancy: float = len(self.queued) * self.UNKNOWN_TASK_DURATION | ||
# TODO: threads per worker | ||
# TODO don't include network occupancy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting question (way out of scope): Should high network occupancy maybe even act as a suppressing factor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. Conceptionally I like the approach of using a dataclass but I'm mildly concerned about performance since we're creating/destroying instances of this a lot.
If that's a problem a named tuple might be better suited. The update mechanics would obviously be more messy.
victim.occupancy = 0 | ||
victim.occupancy.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coverage complains but on main it is covered. This is very likely an untested path and we're just hitting it implicitly.
Also a little concerned about this. I started with a namedtuple, but then every time we update I haven't actually microbenchmarked the performance of any of this though. Another option could be a 2-element NumPy record array. That would also save the pointer chasing to the integers. |
In [5]: %%timeit
...: o = 0
...: for _ in range(10_000):
...: o += 1
...:
403 µs ± 4.11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [4]: %%timeit
...: o = Occupancy(0, 0)
...: for _ in range(10_000):
...: o += Occupancy(1, 1)
...:
5.53 ms ± 251 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: %%timeit
...: o = Occupancy(0, 0)
...: a = Occupancy(1, 1)
...: for _ in range(10_000):
...: o += a
...:
3.07 ms ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Using the dataclass seems to be nearly a 10x slowdown compared to plain numbers. Same for a more realistic example. (I think typically, for a given task, In [9]: %%timeit
...: p = {i: 0 for i in range(1_000)}
...: t = 0
...: for _ in range(3):
...: for i in range(len(p)):
...: old = p[i]
...: p[i] = new = 1
...: delta = new - old
...: t += delta
...:
372 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [8]: %%timeit
...: p = {i: Occupancy(0, 0) for i in range(1_000)}
...: t = Occupancy(0, 0)
...: for _ in range(3):
...: for i in range(len(p)):
...: old = p[i]
...: p[i] = new = Occupancy(1, 1)
...: delta = new - old
...: t += delta
...:
3.45 ms ± 27.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Added an `update` method which takes new values, and returns the delta.
# This saves having to construct a new object every time and throw out the old one,
# though it does still mean constructing the delta.
# This helps but not enormously.
In [2]: %%timeit
...: p = {i: Occupancy(0, 0) for i in range(1_000)}
...: t = Occupancy(0, 0)
...: for _ in range(3):
...: for i in range(len(p)):
...: delta = p[i].update(1, 1)
...: t += delta
...:
2.43 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Does this actually matter though? I'm not sure. The main issue is going to be That takes us from 372 nanoseconds per task using plain numbers, to 2.43 µs per task. @fjetter depending on where we want to go with #7030, we may or may not want to get this in. |
Closes #7004.
Step towards #7003. This notes, but does not fix, the places where we're currently double-counting occupancy. First we should get this PR in and verify it doesn't change any behavior. Then, we can benchmark how a few lines of changes fixing the double-counting affects performance. At the same time, we can update the dashboard to display both measures separately.
pre-commit run --all-files