Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-3268: Provide a demuxer sample app that uses fair routing #320

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Dec 11, 2023

This is a sample application that uses FairShuffleVertexManager.

We can use it like this.

$ hadoop jar /opt/tez/tez-examples-*.jar demuxerdatagen hdfs:///tmp/in 30 200
$ hadoop jar /opt/tez/tez-examples-*.jar demuxer hdfs:///tmp/in hdfs:///tmp/out 100
$ hadoop jar /opt/tez/tez-examples-*.jar demuxer hdfs:///tmp/in hdfs:///tmp/out_precise 100 true

The following are the results on my machine. We can see skewed keys are written by multiple reducers.

zookage@client-node-0:~$ hdfs dfs -ls /tmp/out
Found 33 items
-rw-r--r--   3 hdfs supergroup          0 2023-12-11 09:20 /tmp/out/_SUCCESS
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00000-r-00007
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00001-r-00007
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00002-r-00007
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00003-r-00007
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00004-r-00007
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00005-r-00007
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00006-r-00008
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00007-r-00008
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:18 /tmp/out/category-00008-r-00008
-rw-r--r--   3 hdfs supergroup       6000 2023-12-11 09:18 /tmp/out/category-00009-r-00008
-rw-r--r--   3 hdfs supergroup      15000 2023-12-11 09:17 /tmp/out/category-00010-r-00000
-rw-r--r--   3 hdfs supergroup      30000 2023-12-11 09:17 /tmp/out/category-00011-r-00000
-rw-r--r--   3 hdfs supergroup      60000 2023-12-11 09:17 /tmp/out/category-00012-r-00000
-rw-r--r--   3 hdfs supergroup     120000 2023-12-11 09:17 /tmp/out/category-00013-r-00000
-rw-r--r--   3 hdfs supergroup     243000 2023-12-11 09:17 /tmp/out/category-00014-r-00000
-rw-r--r--   3 hdfs supergroup     489000 2023-12-11 09:17 /tmp/out/category-00015-r-00000
-rw-r--r--   3 hdfs supergroup     981000 2023-12-11 09:17 /tmp/out/category-00016-r-00001
-rw-r--r--   3 hdfs supergroup    1965000 2023-12-11 09:17 /tmp/out/category-00017-r-00001
-rw-r--r--   3 hdfs supergroup    3930000 2023-12-11 09:17 /tmp/out/category-00018-r-00001
-rw-r--r--   3 hdfs supergroup    7863000 2023-12-11 09:17 /tmp/out/category-00019-r-00001
-rw-r--r--   3 hdfs supergroup   15726000 2023-12-11 09:17 /tmp/out/category-00020-r-00001
-rw-r--r--   3 hdfs supergroup   31455000 2023-12-11 09:17 /tmp/out/category-00021-r-00001
-rw-r--r--   3 hdfs supergroup   62913000 2023-12-11 09:20 /tmp/out/category-00022-r-00002
-rw-r--r--   3 hdfs supergroup  125829000 2023-12-11 09:20 /tmp/out/category-00023-r-00002
-rw-r--r--   3 hdfs supergroup  251658000 2023-12-11 09:20 /tmp/out/category-00024-r-00002
-rw-r--r--   3 hdfs supergroup  503316000 2023-12-11 09:20 /tmp/out/category-00025-r-00002
-rw-r--r--   3 hdfs supergroup 1006632000 2023-12-11 09:20 /tmp/out/category-00026-r-00002
-rw-r--r--   3 hdfs supergroup 2013264000 2023-12-11 09:20 /tmp/out/category-00027-r-00002
-rw-r--r--   3 hdfs supergroup 2093796120 2023-12-11 09:18 /tmp/out/category-00028-r-00003
-rw-r--r--   3 hdfs supergroup 1932734880 2023-12-11 09:18 /tmp/out/category-00028-r-00004
-rw-r--r--   3 hdfs supergroup 4187592240 2023-12-11 09:20 /tmp/out/category-00029-r-00005
-rw-r--r--   3 hdfs supergroup 3865469760 2023-12-11 09:20 /tmp/out/category-00029-r-00006
zookage@client-node-0:~$ hdfs dfs -ls /tmp/out_precise
Found 39 items
-rw-r--r--   3 hdfs supergroup          0 2023-12-11 09:26 /tmp/out_precise/_SUCCESS
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00000-r-00016
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00001-r-00016
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00002-r-00016
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00003-r-00016
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00004-r-00016
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00005-r-00016
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00006-r-00017
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00007-r-00017
-rw-r--r--   3 hdfs supergroup       3000 2023-12-11 09:24 /tmp/out_precise/category-00008-r-00017
-rw-r--r--   3 hdfs supergroup       6000 2023-12-11 09:24 /tmp/out_precise/category-00009-r-00017
-rw-r--r--   3 hdfs supergroup      15000 2023-12-11 09:23 /tmp/out_precise/category-00010-r-00000
-rw-r--r--   3 hdfs supergroup      30000 2023-12-11 09:23 /tmp/out_precise/category-00011-r-00000
-rw-r--r--   3 hdfs supergroup      60000 2023-12-11 09:23 /tmp/out_precise/category-00012-r-00000
-rw-r--r--   3 hdfs supergroup     120000 2023-12-11 09:23 /tmp/out_precise/category-00013-r-00000
-rw-r--r--   3 hdfs supergroup     243000 2023-12-11 09:23 /tmp/out_precise/category-00014-r-00000
-rw-r--r--   3 hdfs supergroup     489000 2023-12-11 09:23 /tmp/out_precise/category-00015-r-00000
-rw-r--r--   3 hdfs supergroup     981000 2023-12-11 09:24 /tmp/out_precise/category-00016-r-00001
-rw-r--r--   3 hdfs supergroup    1965000 2023-12-11 09:24 /tmp/out_precise/category-00017-r-00001
-rw-r--r--   3 hdfs supergroup    3930000 2023-12-11 09:24 /tmp/out_precise/category-00018-r-00001
-rw-r--r--   3 hdfs supergroup    7863000 2023-12-11 09:24 /tmp/out_precise/category-00019-r-00001
-rw-r--r--   3 hdfs supergroup   15726000 2023-12-11 09:24 /tmp/out_precise/category-00020-r-00001
-rw-r--r--   3 hdfs supergroup   31455000 2023-12-11 09:24 /tmp/out_precise/category-00021-r-00001
-rw-r--r--   3 hdfs supergroup   62913000 2023-12-11 09:24 /tmp/out_precise/category-00022-r-00002
-rw-r--r--   3 hdfs supergroup  125829000 2023-12-11 09:24 /tmp/out_precise/category-00023-r-00002
-rw-r--r--   3 hdfs supergroup  251658000 2023-12-11 09:24 /tmp/out_precise/category-00024-r-00002
-rw-r--r--   3 hdfs supergroup  503316000 2023-12-11 09:24 /tmp/out_precise/category-00025-r-00003
-rw-r--r--   3 hdfs supergroup 1006632000 2023-12-11 09:24 /tmp/out_precise/category-00026-r-00004
-rw-r--r--   3 hdfs supergroup 1046897280 2023-12-11 09:24 /tmp/out_precise/category-00027-r-00005
-rw-r--r--   3 hdfs supergroup  966366720 2023-12-11 09:24 /tmp/out_precise/category-00027-r-00006
-rw-r--r--   3 hdfs supergroup 1308622575 2023-12-11 09:24 /tmp/out_precise/category-00028-r-00007
-rw-r--r--   3 hdfs supergroup 1308622575 2023-12-11 09:24 /tmp/out_precise/category-00028-r-00008
-rw-r--r--   3 hdfs supergroup 1409285850 2023-12-11 09:24 /tmp/out_precise/category-00028-r-00009
-rw-r--r--   3 hdfs supergroup 1046898060 2023-12-11 09:24 /tmp/out_precise/category-00029-r-00010
-rw-r--r--   3 hdfs supergroup 1046898060 2023-12-11 09:24 /tmp/out_precise/category-00029-r-00011
-rw-r--r--   3 hdfs supergroup 1570347090 2023-12-11 09:26 /tmp/out_precise/category-00029-r-00012
-rw-r--r--   3 hdfs supergroup 1570347090 2023-12-11 09:26 /tmp/out_precise/category-00029-r-00013
-rw-r--r--   3 hdfs supergroup 1570347090 2023-12-11 09:26 /tmp/out_precise/category-00029-r-00014
-rw-r--r--   3 hdfs supergroup 1248224610 2023-12-11 09:26 /tmp/out_precise/category-00029-r-00015

.addEdge(Edge.create(inputVertex, demuxVertex, edgeConf.createDefaultEdgeProperty()));
}

private DAG createDagWithUnion(TezConfiguration tezConf, List<Path> inputPaths, Path outputPath,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This case doesn't work yet until we merge #306

Copy link
Contributor

@abstractdog abstractdog Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#306 was closed due to some reasons
is this PR suitable to be adapted to work without that patch? if so, still interested in seeing this in the codebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abstractdog Thanks for asking. I closed #306 and still need to implement an alternative solution. At this point, I removed the case using UNION from the sample.
9e09665

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 13m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 16m 0s master passed
+1 💚 compile 0m 18s master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 15s master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 checkstyle 1m 26s master passed
+1 💚 javadoc 0m 28s master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 14s master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+0 🆗 spotbugs 1m 4s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 2s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 11s the patch passed
+1 💚 compile 0m 10s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 10s the patch passed
+1 💚 compile 0m 11s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 javac 0m 11s the patch passed
+1 💚 checkstyle 0m 5s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 6s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 6s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 findbugs 0m 23s the patch passed
_ Other Tests _
+1 💚 unit 0m 11s tez-examples in the patch passed.
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
35m 37s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-320/1/artifact/out/Dockerfile
GITHUB PR #320
JIRA Issue TEZ-3268
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 26636412b540 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 0c5cf68
Default Java Private Build-1.8.0_392-8u392-ga-1~22.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~22.04-b08
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-320/1/testReport/
Max. process+thread count 95 (vs. ulimit of 5500)
modules C: tez-examples U: tez-examples
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-320/1/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 12m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 16m 21s master passed
+1 💚 compile 0m 19s master passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 compile 0m 18s master passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+1 💚 checkstyle 1m 9s master passed
+1 💚 javadoc 0m 29s master passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 javadoc 0m 15s master passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+0 🆗 spotbugs 1m 0s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 58s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 11s the patch passed
+1 💚 compile 0m 10s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 javac 0m 11s the patch passed
+1 💚 compile 0m 9s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+1 💚 javac 0m 9s the patch passed
+1 💚 checkstyle 0m 6s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 7s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 javadoc 0m 7s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+1 💚 findbugs 0m 26s the patch passed
_ Other Tests _
+1 💚 unit 0m 11s tez-examples in the patch passed.
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
34m 43s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-320/2/artifact/out/Dockerfile
GITHUB PR #320
JIRA Issue TEZ-3268
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 3b16b1bdd8b2 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 70d79b2
Default Java Private Build-1.8.0_422-8u422-b05-1~22.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~22.04-b05
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-320/2/testReport/
Max. process+thread count 95 (vs. ulimit of 5500)
modules C: tez-examples U: tez-examples
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-320/2/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants