[SPARK-16757] Set up Spark caller context to HDFS and YARN #14659

weiqingy · 2016-08-16T04:44:59Z

What changes were proposed in this pull request?

Pass jobId to Task.
Invoke Hadoop APIs.
- A new function setCallerContext is added in Utils. setCallerContext function invokes APIs of org.apache.hadoop.ipc.CallerContext to set up spark caller contexts, which will be written into hdfs-audit.log and Yarn RM audit log.
- For HDFS: Spark sets up its caller context by invokingorg.apache.hadoop.ipc.CallerContext in Task and Yarn Client and ApplicationMaster.
- For Yarn: Spark sets up its caller context by invoking org.apache.hadoop.ipc.CallerContext in Yarn Client.

How was this patch tested?

Manual Tests against some Spark applications in Yarn client mode and Yarn cluster mode. Need to check if spark caller contexts are written into HDFS hdfs-audit.log and Yarn RM audit log successfully.

For example, run SparkKmeans in Yarn client mode:

./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5

Before:
There will be no Spark caller context in records of hdfs-audit.log and Yarn RM audit log.

After:
Spark caller contexts will be written in records of hdfs-audit.log and Yarn RM audit log.

These are records in hdfs-audit.log:

2016-09-20 11:54:24,116 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_CLIENT_AppId_application_1474394339641_0005
2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0

2016-09-20 11:59:33,868 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=mkdirs  src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1474394339641_0006/container_1474394339641_0006_01_000001/spark-warehouse   dst=null    perm=wyang:supergroup:rwxr-xr-x proto=rpc   callerContext=SPARK_APPLICATION_MASTER_AppId_application_1474394339641_0006_AttemptId_1
2016-09-20 11:59:37,214 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0
2016-09-20 11:59:42,391 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_3_AttemptNum_0

This is a record in Yarn RM log:

2016-09-20 11:59:24,050 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang    IP=127.0.0.1    OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1474394339641_0006    CALLERCONTEXT=SPARK_CLIENT_AppId_application_1474394339641_0006

srowen · 2016-08-17T08:12:19Z

Jenkins test this please

SparkQA · 2016-08-17T10:02:40Z

Test build #63910 has finished for PR 14659 at commit ec6833d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

weiqingy · 2016-08-20T05:14:37Z

Hi, @srowen . Thank you so much for the review. Sorry for the test failure and late update. The failure reasons are that ‘jobID’ were none or there was no ‘spark.app.name’ in sparkConf. I have updated the PR to set default values to ‘jobID’ and ‘spark.app.name’. When a real application runs on Spark, it will always have ‘jobID’ and ‘spark.app.name’.

What's the use case for this?
When Spark applications run on Yarn and HDFS, its caller contexts will be written into hdfs-audit.log. The Spark caller contexts are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applications’ name. When spark reads data from HDFS or writes data into HDFS, a corresponding operation record with spark caller contexts will be written into hdfs-audit.log. That can help users to better diagnose and understand how specific applications impacting parts of the Hadoop system and potential problems they may be creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a given HDFS operation, it's very helpful to track which upper level job issues
it.

steveloughran · 2016-08-20T20:16:39Z

Hadoop's CallerContext class doesn't list Spark as one of the users in its LimitedPrivate scope. Add a Hadoop patch there and I'll get it in. This avoids arguments later when someone breaks the API, and is especially important when using reflection, as it's harder to detect when the Class is being used.

weiqingy · 2016-08-21T05:52:53Z

Hi, @steveloughran Thank you very much for the comments. I have created an Hadoop jira HADOOP-13527 and attached the patch, could you please review it? I am unable to assign the jira to me, could you please add me as “contributor” role in Hadoop? Thanks again.

steveloughran · 2016-08-21T10:03:38Z

having some problems adding you as a contributor; JIRA scale issues, browser problems &c, I've asked others to try and do it. Start with the coding; I'll sort out the contributor entry

cnauroth · 2016-08-21T13:27:52Z

Hello @Sherry302 . Thank you for the patch.

From the HDFS perspective, we recommend against using spaces in the value of the caller context. The HDFS audit log frequently gets parsed by administrators using ad-hoc scripting. Spaces in the fields can make this more challenging for them. For example, if they used an awk script that parsed $NF expecting to find the callerContext, then the "on Spark" at the end would cause it to return "Spark" instead of the full caller context.

May I suggest prepending "Spark" instead? Perhaps something like this:

callerContext=Spark_JobId_0_StageID_0_stageAttemptId_0_taskID_0_attemptNumber_0

steveloughran · 2016-08-21T14:07:48Z

Chris: maybe the CallerContext class could check for bad characters, including spaces, newlines, "," and quotation marks .. the usual things to break parsers.

There's also application ID, (or in yarn appID+attemptID); does that need to be included —or does the audit log include enough about the application that it's not needed?

cnauroth · 2016-08-21T14:12:35Z

Chris: maybe the CallerContext class could check for bad characters, including spaces, newlines, "," and quotation marks .. the usual things to break parsers.

Yes, good conversation to have over in HDFS JIRA.

cnauroth · 2016-08-21T14:24:53Z

Actually a HADOOP JIRA... I filed HADOOP-13528 for follow-up.

…when Yarn client and ApplicationMaster invoke Hadoop caller context API

…when 'Task' invoke Hadoop caller context API

weiqingy · 2016-08-22T07:38:58Z

Hi, @cnauroth Thank you very much for the review and suggestion. I have removed the spaces in the value of the caller context, and prepended "Spark" instead (refer to the commit 3b9a17e).

weiqingy · 2016-08-22T08:16:41Z

Hi, @steveloughran Thanks a lot for the comments.

In the audit log, if users set some configuration in spark-defaults.conf like spark.eventLog.dir hdfs://localhost:9000/spark-history, there will be a record below in audit log:

2016-08-21 23:47:50,834 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=setPermission      src=/spark-history/application_1471835208589_0013.lz4.inprogress     dst=null        perm=wyang:supergroup:rwxrwx---       proto=rpc

We can see the application id application_1471835208589_0013 above. Except that case, there is no Spark application information like application name and application id (or in yarn appID+attemptID) in the audit log. So I think it is better to include application name/id in the caller context. I have updated the PR to include those information.

In the commit 5ab2a41, application ID and attemptID (only in yarn cluster mode) are included in the value of the caller context when Yarn client (if applications run in Yarn client mode) or ApplicationMaster (if applications run in Yarn cluster mode) do some operations in HDFS. So in the audit log, you can see callercontext = Spark_appName_**_appId_**_attemptID_**:
Applications in yarn cluster mode

2016-08-21 22:55:44,568 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt/_spark_metadata dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,573 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,583 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=listStatus  src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,589 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:46,163 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=mkdirs  src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1471835208589_0010/container_1471835208589_0010_01_000001/spark-warehouse   dst=null    perm=wyang:supergroup:rwxr-xr-x proto=rpc   callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1

Applications in yarn client mode

2016-08-21 22:59:20,775 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt/_spark_metadata dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,778 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,785 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=listStatus  src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,791 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011

In the commit 1512775, application ID, name and attempt ID (only in yarn cluster mode) are included in the value of the caller context when Tasks do operations in HDFS. So in the audit log, you can see callercontext=Spark_appName_**_appID_**_appAttemtID_**_JobId_**_StageID_**_stageAttemptId_**_taskID_**_attemptNumber_**:
Applications in Yarn cluster mode

2016-08-21 22:55:50,977 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0
2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0
2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0

Applications in Yarn client mode

2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0
2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0
2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0

For commit 1512775, application Id and attemptID are passed to ‘Task’, is it good for ‘Task’ to see those application information? What do you think about this @steveloughran ? Thanks.

weiqingy · 2016-08-22T08:17:43Z

Thanks a lot for adding me as “contributor” in Hadoop :) @steveloughran @cnauroth

cnauroth · 2016-08-22T18:22:01Z

+1 (non-binding) for the latest revision. I think this is an improvement, speaking from the perspective of the HDFS audit log. Thank you, @Sherry302 . :-)

SparkQA · 2016-08-24T21:48:57Z

Test build #64372 has finished for PR 14659 at commit 1512775.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

weiqingy · 2016-08-24T22:23:07Z

The only failure is 'basic functionality', but it passed locally. I'll re-trigger again.

weiqingy · 2016-08-24T22:23:31Z

Retest this please.

SparkQA · 2016-08-25T00:49:39Z

Test build #64380 has finished for PR 14659 at commit 1512775.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

weiqingy · 2016-08-25T20:54:59Z

Hi, @srowen Could you please review this PR? Thanks.

weiqingy · 2016-08-30T23:07:46Z

Hi, @srowen Could you please review this PR again?

srowen · 2016-08-31T07:17:30Z

I'm not sure I know enough about the APIs and need for this to say I would review and merge this.

steveloughran · 2016-08-31T09:16:50Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+      val callerContext = Utils.classForName("org.apache.hadoop.ipc.CallerContext")
+      callerContext.getMethod("setCurrent", callerContext).invoke(null, ret)
+    } catch {
+      case NonFatal(e) => logDebug(s"${e.getMessage}")


better to use logDebug(s"$e", e). Some exceptions (e.g. NullPointerException) return null from getMessage(); you may also want the stack

steveloughran · 2016-08-31T09:20:54Z

This context is just something passed over IPC to provide a general string for the audit logs, the main actual access of it is in the HDFS audit log

        HdfsAuditLogger hdfsLogger = (HdfsAuditLogger) logger;
        hdfsLogger.logAuditEvent(succeeded, ugiStr, addr, cmd, src, dst,
            status, CallerContext.getCurrent(), ugi, dtSecretManager);

Otherwise it gets passed along RPC calls (along with htrace context), and set by the various entry points (CLI, MR, etc). Lets you find out which piece of code is bringing your NN to its knees. See HDFS-9184

I'd nominate @cnauroth as someone capable of reviewing the link up with HDFS; he's happy now that there aren't spaces in the context string.

Spark-wise: I've added a comment on how a test for this could be written. The HBase keytab reflection code showed me how brittle some of that stuff can be to changes in the reflected-on classes

weiqingy · 2016-09-20T20:25:00Z

Hi, @tgravescs Thank you so much for the review. I have updated the PR based on your every comment.

The only one question left is this one (in Task) "are these params all optional just to make it easier for different task types?" I have replied this. Could you check it again and give your opinion?

To make the caller context more readable, at commit 10dbc6f, I added the static strings AttemptId back ( for stage, task and app) which had been deleted at commit 748e7a9.

Yes, this PR will set up the caller context for both HDFS and YARN. At very beginning, to make the review easier, I created two different jiras to set up caller contexts for HDFS(SPARK-16757) and YARN (SPARK-16758) although the code is the same. I have updated the jiras, the title of this PR, and the description of this PR. In the “How was this patch tested” of the PR’s description, you can see what are showing in HDFS hdfs-audit.log and Yarn RM audit log.

When invoking Hadoop CallerContext API in Yarn Client, the caller context (including SPARK_CLIENT with AppId only) will be written to both HDFS audit log and Yarn RM audit log.
In hdfs-audit.log:

2016-09-20 11:54:24,116 INFO FSNamesystem.audit: allowed=true   ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open    src=/lr_big.txt dst=null    perm=null   proto=rpc   callerContext=SPARK_CLIENT_AppId_application_1474394339641_0005

In Yarn RM log:

2016-09-20 11:59:24,050 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang    IP=127.0.0.1    OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1474394339641_0006    CALLERCONTEXT=SPARK_CLIENT_AppId_application_1474394339641_0006

Also, I have tested this with multiple tasks running in the same executor. Take application_1474394339641_0006 as example.

My command line to run tests as below:

./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5

In Spark History Application page, you can see there are two executors (one is driver), in the executor, there are 46 tasks:

In HDFS audit log, there are 46 task records.:

2016-09-20 11:59:33,868 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=mkdirs     src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1474394339641_0006/container_1474394339641_0006_01_000001/spark-warehouse        dst=null       perm=wyang:supergroup:rwxr-xr-x       proto=rpc        callerContext=SPARK_APPLICATION_MASTER_AppId_application_1474394339641_0006_AttemptId_1
2016-09-20 11:59:37,214 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0
2016-09-20 11:59:42,391 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_3_AttemptNum_0
2016-09-20 11:59:42,432 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_4_AttemptNum_0
2016-09-20 11:59:42,445 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_5_AttemptNum_0
2016-09-20 11:59:46,720 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_6_AttemptNum_0
2016-09-20 11:59:46,735 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_7_AttemptNum_0
2016-09-20 11:59:46,739 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_8_AttemptNum_0
2016-09-20 11:59:50,915 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_9_AttemptNum_0
2016-09-20 11:59:50,932 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_10_AttemptNum_0
2016-09-20 11:59:50,934 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_10_AttemptNum_0
2016-09-20 11:59:50,941 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_11_AttemptNum_0
2016-09-20 11:59:50,943 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_11_AttemptNum_0
2016-09-20 11:59:54,757 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_9_AttemptNum_0
2016-09-20 11:59:54,774 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_12_AttemptNum_0
2016-09-20 11:59:54,776 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_12_AttemptNum_0
2016-09-20 11:59:54,799 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_13_AttemptNum_0
2016-09-20 11:59:54,803 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_13_AttemptNum_0
2016-09-20 11:59:54,812 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_14_AttemptNum_0
2016-09-20 11:59:54,814 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_14_AttemptNum_0
2016-09-20 11:59:59,024 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_15_AttemptNum_0
2016-09-20 11:59:59,025 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_15_AttemptNum_0
2016-09-20 11:59:59,044 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_16_AttemptNum_0
2016-09-20 11:59:59,046 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_16_AttemptNum_0
2016-09-20 11:59:59,059 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_17_AttemptNum_0
2016-09-20 11:59:59,060 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_17_AttemptNum_0
2016-09-20 12:00:03,065 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_18_AttemptNum_0
2016-09-20 12:00:03,066 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_18_AttemptNum_0
2016-09-20 12:00:03,107 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_19_AttemptNum_0
2016-09-20 12:00:03,107 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_20_AttemptNum_0
2016-09-20 12:00:03,108 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_20_AttemptNum_0
2016-09-20 12:00:03,109 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_19_AttemptNum_0
2016-09-20 12:00:07,359 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_22_AttemptNum_0
2016-09-20 12:00:07,360 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_22_AttemptNum_0
2016-09-20 12:00:07,366 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_21_AttemptNum_0
2016-09-20 12:00:07,367 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_21_AttemptNum_0
2016-09-20 12:00:11,076 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_27_AttemptNum_0
2016-09-20 12:00:11,094 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_28_AttemptNum_0
2016-09-20 12:00:11,094 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_29_AttemptNum_0
2016-09-20 12:00:15,082 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_30_AttemptNum_0
2016-09-20 12:00:15,094 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_31_AttemptNum_0
2016-09-20 12:00:15,189 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_32_AttemptNum_0
2016-09-20 12:00:19,481 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_33_AttemptNum_0
2016-09-20 12:00:19,530 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_34_AttemptNum_0
2016-09-20 12:00:19,532 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_34_AttemptNum_0
2016-09-20 12:00:19,620 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_35_AttemptNum_0
2016-09-20 12:00:19,621 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_35_AttemptNum_0
2016-09-20 12:00:24,193 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_33_AttemptNum_0
2016-09-20 12:00:24,210 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_36_AttemptNum_0
2016-09-20 12:00:24,211 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_36_AttemptNum_0
2016-09-20 12:00:24,279 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_37_AttemptNum_0
2016-09-20 12:00:24,281 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_37_AttemptNum_0
2016-09-20 12:00:24,303 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_38_AttemptNum_0
2016-09-20 12:00:24,305 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_38_AttemptNum_0
2016-09-20 12:00:29,118 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_39_AttemptNum_0
2016-09-20 12:00:29,120 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_39_AttemptNum_0
2016-09-20 12:00:29,149 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_40_AttemptNum_0
2016-09-20 12:00:29,151 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_40_AttemptNum_0
2016-09-20 12:00:29,161 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_41_AttemptNum_0
2016-09-20 12:00:29,163 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_41_AttemptNum_0
2016-09-20 12:00:33,143 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_42_AttemptNum_0
2016-09-20 12:00:33,144 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_43_AttemptNum_0
2016-09-20 12:00:33,144 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_42_AttemptNum_0
2016-09-20 12:00:33,145 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_43_AttemptNum_0
2016-09-20 12:00:33,184 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_44_AttemptNum_0
2016-09-20 12:00:33,185 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_44_AttemptNum_0
2016-09-20 12:00:37,142 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_45_AttemptNum_0
2016-09-20 12:00:37,144 INFO FSNamesystem.audit: allowed=true        ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=open       src=/lr_big.txt dst=null       perm=null      proto=rpc callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_1_StageId_1_AttemptId_0_TaskId_45_AttemptNum_0

SparkQA · 2016-09-20T21:33:15Z

Test build #65677 has finished for PR 14659 at commit 10dbc6f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2016-09-21T14:20:01Z

core/src/main/scala/org/apache/spark/util/Utils.scala

-   val JobID = if (jobID.isDefined) s"_JobID_${jobID.get}" else ""
-   val StageID = if (stageID.isDefined) s"_StageID_${stageID.get}" else ""
-   val StageAttemptId = if (stageAttemptId.isDefined) s"_${stageAttemptId.get}" else ""
+   val AppId = if (appId.isDefined) s"_AppId_${appId.get}" else ""


as mentioned please remove AppId_... the application id is pretty obvious in the logs it starts with application_ so no need to print extra characters

tgravescs · 2016-09-21T14:34:15Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala


-      new CallerContext(Option(System.getProperty("spark.app.name")),
-        Option(appAttemptId.getApplicationId.toString), attemptID).set()
+      new CallerContext("APPLICATION_MASTER",


truncate to APPMASTER

tgravescs · 2016-09-21T14:42:23Z

core/src/main/scala/org/apache/spark/util/Utils.scala

-   val StageID = if (stageID.isDefined) s"_StageID_${stageID.get}" else ""
-   val StageAttemptId = if (stageAttemptId.isDefined) s"_${stageAttemptId.get}" else ""
+   val AppId = if (appId.isDefined) s"_AppId_${appId.get}" else ""
+   val AppAttemptId = if (appAttemptId.isDefined) s"_AttemptId_${appAttemptId.get}" else ""


I don't agree with adding the AttemptId_ string back. I understand it helps readability but its a lot of characters compared to the actual attempt id. I want this thing to be as small as possible as to not add extra overhead to the rpc calls. The strings from your example run are already 112 characters long (SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_14_AttemptNum_0), if you start getting into many tasks and stages that could reach the default 128 print to the audit log and get truncated.

how about SPARK_TASK_application_1474394339641_0006_1_JId_0_SId_0_0_TId_14_0, that is only 66 characters. yes the user may have to look up the format but I think that is ok. If its really that big of an issue parsing this we can change it later but I would rather have it smaller and better performance and and have all the information (rather then it possibly getting truncated).

Really right now I think all you need is the task id and attempt because the numbers just increase across jobs and stages, but having the job id and stage id would be helpful to find the task id quickly and handles if that every changes and we have taskid unique per job or stage.

weiqingy · 2016-09-22T23:42:03Z

Hi, @tgravescs Thank you very much. Yes. I have updated the PR to make the string of the caller context shorter.

SparkQA · 2016-09-23T02:01:18Z

Test build #65799 has finished for PR 14659 at commit 47de8a2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2016-09-26T14:08:32Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+ * @param stageId id of the stage this task belongs to
+ * @param stageAttemptId attempt id of the stage this task belongs to
+ * @param taskId task id
+ * @param taskAttemptNumber task attempt id


sorry I missed this before, this is new feature and will only go into 2.1, lets just remove the @SInCE since this isn't public api

tgravescs · 2016-09-26T14:10:10Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+   stageAttemptId: Option[Int] = None,
+   taskId: Option[Long] = None,
+   taskAttemptNumber: Option[Int] = None) extends Logging {
+


make all these local vals start with lower case, add Str to them if you need to differentiate appIdStr

tgravescs · 2016-09-26T14:11:21Z

Couple minor things otherwise looks good.

weiqingy · 2016-09-26T21:41:27Z

Hi, @tgravescs Thanks a lot for the comments. I have updated the PR to rename local vals and remove the @since in Utils.scala.

SparkQA · 2016-09-27T00:12:21Z

Test build #65934 has finished for PR 14659 at commit dbcabfc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2016-09-27T13:07:29Z

+1

tgravescs · 2016-09-27T13:10:32Z

thanks @Sherry302

weiqingy · 2016-09-27T17:05:23Z

Thanks a lot for the review. @tgravescs @cnauroth @steveloughran @srowen

weiqingy · 2016-09-27T22:53:19Z

Hi, @tgravescs Should we also commit this PR to Branch-2? Thanks.

srowen · 2016-09-28T10:16:57Z

@Sherry302 @tgravescs oops it looks like this causes master Maven builds to fail:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.3/2000/consoleFull

But not the SBT ones, weird. And it fails REPL tests, with an odd error. I don't know what's actually going on there.

Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCircularityError: io/netty/util/internal/__matchers__/org/apache/spark/network/protocol/MessageMatcher
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:62)
    at io.netty.util.internal.JavassistTypeParameterMatcherGenerator.generate(JavassistTypeParameterMatcherGenerator.java:54)
    at io.netty.util.internal.TypeParameterMatcher.get(TypeParameterMatcher.java:42)
    at io.netty.util.internal.TypeParameterMatcher.find(TypeParameterMatcher.java:78)
    at io.netty.handler.codec.MessageToMessageEncoder.<init>(MessageToMessageEncoder.java:59)
    at org.apache.spark.network.protocol.MessageEncoder.<init>(MessageEncoder.java:34)
    at org.apache.spark.network.TransportContext.<init>(TransportContext.java:78)
    at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:354)
    at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324)

tgravescs · 2016-09-28T15:48:05Z

this is a new feature so generally don't put them into point releases which are bug fixes.

@srowen thanks for pointing out, I will take alook.

weiqingy · 2016-09-28T16:07:21Z

@tgravescs @srowen Sorry for the failure. I am looking into it.

weiqingy · 2016-09-28T18:47:36Z

Hi, @tgravescs @srowen
Give an intermediate update, If using Class.forName instead of Utils.classForName, Maven build and all of the tests will be passed.

  def setCurrentContext(): Boolean = {
    var succeed = false
    try {
      // scalastyle:off classforname
      val callerContext = Class.forName("org.apache.hadoop.ipc.CallerContext")
      val Builder = Class.forName("org.apache.hadoop.ipc.CallerContext$Builder")
      // scalastyle:on classforname
      val builderInst = Builder.getConstructor(classOf[String]).newInstance(context)
      val hdfsContext = Builder.getMethod("build").invoke(builderInst)
      callerContext.getMethod("setCurrent", callerContext).invoke(null, hdfsContext)
      succeed = true
    } catch {
      case NonFatal(e) => logInfo("Fail to set Spark caller context", e)
    }
    succeed
  }

I am doing some tests and trying to figure out the root cause.

tgravescs · 2016-09-28T20:50:38Z

Yeah its definitely caused by the call in Task to the Utils.CallerContext.setCurrent(). It seems to be from it using the ExecutorClassLoader to go try to check the class for the executor and I guess because ExecutorClassLoader overrides findClass which is called form loadClass which is called from forName. There must be something slightly different in the MessageMatcher which I guess is generated on the fly by netty. I guess loading the remote one must mismatch with the local one.

@srowen do you think we revert this while we figure it out?

weiqingy · 2016-09-28T21:41:46Z

@tgravescs @srowen Thanks. Using Class.forName which uses this.getClass().getClassLoader() by default makes all the tests passed (both sbt and maven). However there must be some reason we prefer Utils.classForName instead. Do you have any suggestions?

tgravescs · 2016-09-28T21:47:13Z

its there to make sure everyone is using the same classloader and to handle if it they are chained. I'm not really familiar with all the scenarios of the repl. I see the Suite itself is getting the classloader and loading somethings based on uri.

I think for now can you put up a patch changing to use Class.forName and we can file a followup jira to investigate more.

weiqingy · 2016-09-28T22:15:15Z

Thanks @tgravescs yes. I have created a PR 15286.

weiqingy · 2016-09-28T23:41:29Z

Hi, @tgravescs SPARK-17714 has been created for further investigation.

[SPARK-16757] Set up Spark caller context to HDFS

ec6833d

Use 'getOrElse' instead of 'get'

6b66e49

weiqingy added 3 commits August 21, 2016 18:27

Remove spaces in the value of Spark caller context

3b9a17e

Add Application ID and Attempt ID in the value of the caller context …

5ab2a41

…when Yarn client and ApplicationMaster invoke Hadoop caller context API

Add application ID and attempt ID to the value of the caller context …

1512775

…when 'Task' invoke Hadoop caller context API

steveloughran reviewed Aug 31, 2016
View reviewed changes

Add an unit test

ae42093

tgravescs reviewed Sep 21, 2016

View reviewed changes

Update the content of the caller context string

47de8a2

tgravescs reviewed Sep 26, 2016

View reviewed changes

rename local vals and remove the '@SInCE'

dbcabfc

asfgit closed this in 6a68c5d Sep 27, 2016

weiqingy mentioned this pull request Oct 13, 2016

[SPARK-17802] Improved caller context logging. #15377

Closed

[SPARK-16757] Set up Spark caller context to HDFS and YARN #14659

[SPARK-16757] Set up Spark caller context to HDFS and YARN #14659

Uh oh!

Conversation

weiqingy commented Aug 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Aug 17, 2016

Uh oh!

SparkQA commented Aug 17, 2016

Uh oh!

weiqingy commented Aug 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran commented Aug 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiqingy commented Aug 21, 2016

Uh oh!

steveloughran commented Aug 21, 2016

Uh oh!

cnauroth commented Aug 21, 2016

Uh oh!

steveloughran commented Aug 21, 2016

Uh oh!

cnauroth commented Aug 21, 2016

Uh oh!

cnauroth commented Aug 21, 2016

Uh oh!

weiqingy commented Aug 22, 2016

Uh oh!

weiqingy commented Aug 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiqingy commented Aug 22, 2016

Uh oh!

cnauroth commented Aug 22, 2016

Uh oh!

SparkQA commented Aug 24, 2016

Uh oh!

weiqingy commented Aug 24, 2016

Uh oh!

weiqingy commented Aug 24, 2016

Uh oh!

SparkQA commented Aug 25, 2016

Uh oh!

weiqingy commented Aug 25, 2016

Uh oh!

weiqingy commented Aug 30, 2016

Uh oh!

srowen commented Aug 31, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Aug 31, 2016

Uh oh!

weiqingy commented Sep 20, 2016

Uh oh!

SparkQA commented Sep 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiqingy commented Sep 22, 2016

Uh oh!

SparkQA commented Sep 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiqingy commented Aug 16, 2016 •

edited

Loading

weiqingy commented Aug 20, 2016 •

edited

Loading

steveloughran commented Aug 20, 2016 •

edited

Loading

weiqingy commented Aug 22, 2016 •

edited

Loading

weiqingy commented Sep 28, 2016 •

edited

Loading