Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-3821: Ability to fail fast tasks that write too much to local disk. #314

Merged
merged 4 commits into from
Oct 27, 2023

Conversation

ayushtkn
Copy link
Member

No description provided.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@@ -262,6 +263,13 @@ private synchronized ResponseWrapper heartbeat(Collection<TezEvent> eventsArg) t
sendCounters = true;
prevCounterSendHeartbeatNum = nonOobHeartbeatCounter.get();
}
try {
task.checkTaskLimits();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this in heartbeat can significantly increase its runtime. As in, even small increase of few milliseconds can have impact on cluster usage. You may want to run this every 10 seconds here or in some other way. This should ideally handle sorter spills and merges, in which case it can handle another ticket which was created on similar lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx Rajesh, there was some logic withHEAP_MEMORY_USAGE_UPDATE_INTERVAL, I used similar logic here to make sure the check happens at every 10 sec

@tez-yetus

This comment was marked as outdated.

@rbalamohan
Copy link
Contributor

Minor: change TaskLimitException to some other name, which can represent that limit is related to file bytes written.

Rest LGTM. + 1.

@tez-yetus

This comment was marked as outdated.

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 5m 58s Maven dependency ordering for branch
+1 💚 mvninstall 10m 56s master passed
+1 💚 compile 0m 55s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 52s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 checkstyle 1m 2s master passed
+1 💚 javadoc 1m 2s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 51s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+0 🆗 spotbugs 0m 36s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 49s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 8s Maven dependency ordering for patch
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 javac 0m 27s the patch passed
+1 💚 checkstyle 0m 16s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 24s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 findbugs 1m 8s the patch passed
_ Other Tests _
+1 💚 unit 1m 57s tez-api in the patch passed.
+1 💚 unit 0m 30s tez-runtime-internals in the patch passed.
+1 💚 asflicense 0m 21s The patch does not generate ASF License warnings.
31m 15s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-314/5/artifact/out/Dockerfile
GITHUB PR #314
JIRA Issue TEZ-3821
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 73182c616adc 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 4bc87e2
Default Java Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-314/5/testReport/
Max. process+thread count 440 (vs. ulimit of 5500)
modules C: tez-api tez-runtime-internals U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-314/5/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog merged commit 51d6f53 into apache:master Oct 27, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants