Skip to content

[Bug] MR Client may lost data or throw exception when rss.storage.type without MEMORY. #886

@zhengchenyu

Description

@zhengchenyu

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

1 Bug description

When rss.storage.type without MEMORY, client-mr may raise exception as below:

2023-05-16 18:58:52,191 INFO mapreduce.Job: Task Id : attempt_1683514063269_3300_r_000025_0, Status : FAILED
Error: org.apache.uniffle.common.exception.RssException: Blocks read inconsistent: expected 13 blocks, actual 8 blocks
	at org.apache.uniffle.common.util.RssUtils.checkProcessedBlockIds(RssUtils.java:287)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.checkProcessedBlockIds(ShuffleReadClientImpl.java:253)
	at org.apache.hadoop.mapreduce.task.reduce.RssFetcher.copyFromRssServer(RssFetcher.java:193)
	at org.apache.hadoop.mapreduce.task.reduce.RssFetcher.fetchAllRssBlocks(RssFetcher.java:133)
	at org.apache.hadoop.mapreduce.task.reduce.RssShuffle.run(RssShuffle.java:202)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:377)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

In fact, the problem happen firstly in our internal version on client-tez module. Below is tez error stack:

2023-05-16 18:35:18,591 INFO impl.ComposedClientReadHandler: Failed to read shuffle data caused by
org.apache.uniffle.common.exception.RssException: Can't get FileSystem for hdfs://devtest-ns-fed/uniffle-rss/appattempt_1684233307050_0001_000000/1/0-0
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.init(HdfsClientReadHandler.java:113)
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.readShuffleData(HdfsClientReadHandler.java:162)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:101)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:129)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:238)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:162)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.copyFromRssServer(RssFetcherOrderedGrouped.java:166)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.fetchAllRssBlocks(RssFetcherOrderedGrouped.java:151)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:296)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:29)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at org.apache.uniffle.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2023-05-16 18:35:18,595 ERROR rss.RssShuffleScheduler: Summation: Fetcher failed with error
org.apache.uniffle.common.exception.RssFetchFailedException: Failed to read shuffle data from WARM handler
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:109)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:129)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:238)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:162)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.copyFromRssServer(RssFetcherOrderedGrouped.java:166)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.fetchAllRssBlocks(RssFetcherOrderedGrouped.java:151)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:296)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:29)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at org.apache.uniffle.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.uniffle.common.exception.RssException: Can't get FileSystem for hdfs://devtest-ns-fed/uniffle-rss/appattempt_1684233307050_0001_000000/1/0-0
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.init(HdfsClientReadHandler.java:113)
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.readShuffleData(HdfsClientReadHandler.java:162)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:101)
	... 14 more

In fact, the reproduce probability is very high in tez-local mode. The reproduce probability is low in mr on yarn mode, then I sleep 1 second before shuffleWriteClient.sendShuffleData in SortWriteBufferManager, the The reproduce probability is very high.

2 Reason

When the bug happen, the value of expect committed in below log is a random value.

[INFO] 2023-05-16 19:07:20,436 Grpc-272 ShuffleTaskManager commitShuffle - Checking commit result for appId[appattempt_1683514060868_9741_000001], shuffleId[0], expect committed[390], remain[390]

Here we know that shuffleWriteClient.sendShuffleData run in a async thread. when we call finishShuffle, sendShuffleData may not happen, so some data will never flush in shuffle server.

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions