Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRunner tracking bug #5512

Closed
15 tasks done
swankjesse opened this issue Sep 29, 2019 · 7 comments
Closed
15 tasks done

TaskRunner tracking bug #5512

swankjesse opened this issue Sep 29, 2019 · 7 comments
Labels
enhancement Feature not a bug
Milestone

Comments

@swankjesse
Copy link
Collaborator

swankjesse commented Sep 29, 2019

This issue is intended to help me organize what’s done and needs doing for TaskRunner.

See also

@swankjesse swankjesse added the enhancement Feature not a bug label Sep 29, 2019
@swankjesse swankjesse added this to the 4.3 milestone Sep 29, 2019
swankjesse pushed a commit that referenced this issue Oct 5, 2019
Now we don't have to alternate between the coordinator thread and the task
thread between task runs if the task returns 0. Instead the task thread can
stay resident.

This implementation works by having task runnables that can switch from
the coordinator role (sleeping until the next task starts) and the executor
role.

#5512
swankjesse pushed a commit that referenced this issue Oct 6, 2019
Now we don't have to alternate between the coordinator thread and the task
thread between task runs if the task returns 0. Instead the task thread can
stay resident.

This implementation works by having task runnables that can switch from
the coordinator role (sleeping until the next task starts) and the executor
role.

#5512
swankjesse pushed a commit that referenced this issue Oct 6, 2019
Now we don't have to alternate between the coordinator thread and the task
thread between task runs if the task returns 0. Instead the task thread can
stay resident.

This implementation works by having task runnables that can switch from
the coordinator role (sleeping until the next task starts) and the executor
role.

#5512
@swankjesse
Copy link
Collaborator Author

To improve visibility I think I'll borrow from what we do for HTTP/2 frames: use logger.fine() to print events and metrics. If you know it exists you can turn it on, but no public API.

swankjesse pushed a commit that referenced this issue Dec 7, 2019
This was a regression introduced with the TaskRunner changes.
I couldn't find other places where daemon threads were likely
to cause potential problems.

#5512
swankjesse pushed a commit that referenced this issue Dec 7, 2019
This was a regression introduced with the TaskRunner changes.
I couldn't find other places where daemon threads were likely
to cause potential problems.

#5512
swankjesse pushed a commit that referenced this issue Dec 7, 2019
This was a regression introduced with the TaskRunner changes.
I couldn't find other places where daemon threads were likely
to cause potential problems.

#5512
@swankjesse
Copy link
Collaborator Author

I wanna build debug logging for tasks. Here’s a sketch of a sample log:

Q1 ▌ scheduled after 100 ms: OkHttp example.com ping
Q1 ▌▌ scheduled after 200 ms: OkHttp example.com[3] writeSynReset
Q1 ▌▌ starting: OkHttp example.com ping
Q1 ▌▌ executed in 130 ms (next after 100 ms): OkHttp example.com ping
Q1 ▌▌ starting (31 ms late): OkHttp example.com[3] writeSynReset
Q1 ▌▌ executed in 2 ms: OkHttp example.com[3] writeSynReset
Q2 ▌ posted: OkHttp ConnectionPool

Notes:

  • Q1, Q2 etc. is queue name. This is a process-wide ID.
  • ▌▌ is queue size in tasks. This includes task being just-added or just-completed.
  • scheduled for delayed tasks, posted for immediate tasks.
  • units are ms, µs, or s. No nanos in logs; just truncate to 0 µs.
  • "starting" includes start delay like (31 ms late) if task start is more than 10 ms later than scheduled. This will be due to busy CPU or a preceding task going overtime.
  • need to measure execution time!

These logs will be off by default. Enable FINE logging on TaskRunner to get logs.

@Egorand
Copy link
Collaborator

Egorand commented Dec 9, 2019

A couple thoughts:

  • Probably worth keeping logs indented to the same level after the queue size, otherwise it feels like the logs with the same indentation are grouped, while they're not. Perhaps having a uniform indentation that grows if the queue gets bigger could work?
Q1 ▌     scheduled after 100 ms: OkHttp example.com ping
Q1 ▌▌    scheduled after 200 ms: OkHttp example.com[3] writeSynReset
Q1 ▌▌▌   starting: OkHttp example.com ping
Q1 ▌▌▌▌  executed in 130 ms (next after 100 ms): OkHttp example.com ping
Q1 ▌▌    starting (31 ms late): OkHttp example.com[3] writeSynReset
Q1 ▌▌▌▌▌ executed in 2 ms: OkHttp example.com[3] writeSynReset
Q2 ▌▌▌▌▌▌▌▌▌▌ posted: OkHttp ConnectionPool
Q2 ▌▌▌▌▌▌▌    posted: OkHttp ConnectionPool
Q2 ▌▌▌▌▌▌▌▌▌  posted: OkHttp ConnectionPool
Q2 ▌     posted: OkHttp ConnectionPool
  • How about printing scheduled, starting, executed and posted in bold?

@yschimke
Copy link
Collaborator

@swankjesse Nice work on this, impressive to see the progression!

@swankjesse
Copy link
Collaborator Author

I think I'm ready to call this complete. Anything else you wanna see for observability or troubleshooting?

@yschimke
Copy link
Collaborator

@swankjesse let's follow up with that based on working through flaky CI tests. That's my main goal - can we use our own tools to after the fact debug

@swankjesse
Copy link
Collaborator Author

No further action for this issue. Will work through flaky tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature not a bug
Projects
None yet
Development

No branches or pull requests

3 participants