-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17461. IOStatisticsContext + committer integration #4566
HADOOP-17461. IOStatisticsContext + committer integration #4566
Conversation
you are going to love this. Full end to end stats collection of IO during job commit of terasort.
|
🎊 +1 overall
This message was automatically generated. |
Here are the stats for a magic run where we also measure list calls. The fact that hsync is being called shows that the terasort output is also being picked up. not intentional. raises a big issue: should these committers reset the context before task/job commit.
this is going to overreport, but i don't think the committers should be interfering with the context which should really be managed by the app |
🎊 +1 overall
This message was automatically generated. |
* move reference map and lookup to a IOStatisticsContextIntegration class * static method in IOStatisticsContext to relay lookup * add method to switch a thread's context; needed to aggregate worker thread IO in threads doing work for committers without the need to explicitly collect and pass back the stats * production code moves to the new methods * tests move to this and away from looking up the fields in the streams * stats are reset in s3a test setup * s3a committers collect data read stats during job commit and include in summary statistics. This is only the stats when reading manifest files, not the actual work. * tests to print the aggregate of all loaded success files in the run. Change-Id: I604990f2132b76d38e85ca8b777630225c32158e
* Cost of scan/load of magic files in task commit are collected * S3A list iterators update the context stats of the thread they were created in in close() calls. * With close() passthrough working and TaskPool invoking it if the iterator is closeable. Change-Id: If0a0c2de08d52a74b7c1f9498716d423b97b4003
In spark we would be resetting before the execution begins I assume, so having it reset in the committers could mess things up? Was thinking about the case like a disctp job, how do we reset the stats in that case? |
exactly. distcp, good thought. let's not worry about collecting results there |
bf6ee6f
to
ef9643c
Compare
* Resetting and collection of context IO stats is optional; off by default * TaskPool wired up to switch to any builder-supplied context. This makes it easier to use and lines us up for manifest committer to also adopt. * Add an ID field to the interface, mainly for logging. This is separate from the thread ID Change-Id: I3b57dc915c34b1234b200ad00d79adc119ff6919
40ca262
to
ac6a3fa
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
+setting thread context to null resets it +move merging of fs stats into finally block, after streamStatistics close has been called to update final stats Change-Id: I913b6a473da12918025e7ec11d4168bd135f0fc5
This is important for testing IOStatisticsContext functionality in unit tests as well as a source of actual data. +fixed the checkstyle Change-Id: I4f429a6a81729027026dc46bd1519f90a145c205
ac6a3fa
to
9dd221d
Compare
🎊 +1 overall
This message was automatically generated. |
This is PR #4352 with a new patch to integrate with the MR committers, at least
as far as collecting read stats from job commits and including in _SUCCESS
the committer ITests collect these and print them.
IO in threads doing work for committers without the need to explicitly
collect and pass back the stats
in summary statistics. This is only the stats when reading manifest
files, not the actual work.
How was this patch tested?
new/modified tests
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?