[SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a path that is also being read from. #21257

zheh12 · 2018-05-07T10:12:23Z

What changes were proposed in this pull request?

When insert overwrite a parquet table. There is a check.

 if (overwrite) DDLUtils.verifyNotReadPath(actualQuery, outputPath)

The check throws exception if output path tries to overwrite the same input path. This check(limitation) only exists in datasource table but not hive table. Shall we remove this check?

We cannot read and overwrite a HadoopFsRelation with the same path -- input and output should be different. The reason is that spark deletes the output partition path before reading.
This pr proposes to mark/cache the paths(to delete) before reading. And postpone deletion when commit job.

How was this patch tested?

I just udpated InsertSuite and MetastoreDataSourceSuite.

zheh12 · 2018-05-07T10:52:40Z

cc @cloud-fan @jiangxb1987
The core idea is to cache the partitions paths and delete them when commit job. Thus we can have the same input and output path. Is there some drawbacks for this idea? Please give some advice when you have time

cloud-fan · 2018-05-07T12:44:19Z

...ain/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala

this deletes leaf files one by one, have you evaluated the performance difference?

First of all, if it is the root directory of the table, I must record all the files in the directory, and wait until the job is commited to delete. Because the _temporary of the entire job is also in the directory, I cannot directly delete the entire directory.

Second, when we record the files that need to be deleted, we just list the files in the root directory non-recursively. Under normal circumstances, the number of files in the first-level directory of the partition table will not be too much.

In the end, this will certainly be slower than directly deleting the entire directory, but under the current implementation, we cannot directly delete the entire table directory.

is it possible to only do it with overwrite? we should not introduce perf regression when not necessary.

have you considered the approach taken by dynamicPartitionOverwrite? i.e. using staging directory.

From the code point of view, the current implementation is deleteMatchingPartitions happend only if overwrite is specified.

Using dynamicPartitionOverwrite will not solve this problem,because it will also generate a .stage directory under the table root directory. We still need to record all the files we want to delete, but we cannot directly delete the root directories.
The dynamic partition overwrite is actually recording all the partitions that need to be deleted and then deleted one by one. And the entire table overwrite deletes all the data of the entire directory, it needs to record all deleted partition directory files,so in fact the implementation of the code is similar with dynamicPartitionOverwrite .

If I do this, when the job is committed, it will delete the entire output directory. And there will be no data.

we will delete files just before committing job, do we?

The key is that the data is already in the output directory before committing job, and we can't delete the output directory anymore.

We overloaded FileCommitProtocol in the HadoopMapReduceCommitProtocol with the deleteWithJob method. Now it will not delete the file immediately, but it will wait until the entire job is committed.

We did delete the files with committed the job, but the temporary output files were generated when the task was started. These temporary output files are in the output directory. And the data will be move out to the output directory.

After the job starts, there is no safe time to delete the entire output directory.

ok, then how about adding a new parameter canDeleteNow: Boolean to FileCommitProtocol.deleteWithJob?

That's a good idea. I change my code.

cloud-fan · 2018-05-08T02:55:58Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

nit: code style. see https://github.com/apache/spark/pull/21257/files#diff-d97cfb5711116287a7655f32cd5675cbR132

I have changed this code.

cloud-fan · 2018-05-08T02:57:51Z

cc @ericl

cloud-fan · 2018-05-09T06:25:10Z

ok to test

SparkQA · 2018-05-09T07:05:01Z

Test build #90400 has finished for PR 21257 at commit a51620b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

zheh12 · 2018-05-09T07:50:41Z

Jenkins, retest this please.

zheh12 · 2018-05-09T09:38:00Z

cc @cloud-fan, Jenkins has some error, please help me retest, thanks

cloud-fan · 2018-05-09T14:32:09Z

retest this please

SparkQA · 2018-05-09T19:07:57Z

Test build #90413 has finished for PR 21257 at commit a51620b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-05-14T10:24:31Z

core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala

seems the recursive is always passed as true? can we remove it?

In the current situation we can delete it, but I feel it better to use a default value true.

Is there any (potential) cases we need a recursive parameter?

I will remove the recursive parameter.

cloud-fan · 2018-05-14T10:29:41Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

will this be different from stagingDir.getFileSystem(jobContext.getConfiguration)?

StagingDir is not always be valid hadoop path, but the JobContext work dir always be.

can we change other places in this method to use the fs created here?

I'm not sure you can guarantee that the working dir is always the dest FS. At least with @rdblue's committers, task attempts work dirs are in file:// & task commit (somehow) gets them to the destFS in a form where job commit will make them visible.

I change my code.
I now record every FileSystem will delete the path with a map structure. And Don't assume that they will use the same FileSystem.

cloud-fan · 2018-05-14T10:32:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

isReadPath?

isInReadPath or inReadPath or isReadPath better?

steveloughran · 2018-05-14T13:36:35Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

I'd personally ignore a failure on delete(), as the conditions for the API call are "if this doesn't raise an exception then the dest is gone". You can skip the exists check as it will be superfluous

SparkQA · 2018-05-14T17:34:21Z

Test build #90593 has finished for PR 21257 at commit 273b1af.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-05-15T06:58:36Z

Test build #90619 has finished for PR 21257 at commit 6821795.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-05-15T07:32:45Z

retest this please

SparkQA · 2018-05-15T12:04:08Z

Test build #90632 has finished for PR 21257 at commit 6821795.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-05-15T12:10:08Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

you don't need to do the exists check, it's just overhead. delete() will return false if there was nothing to delete.

But...what if that delete throws an exception? Should the commit fail (as it does now?), or downgraded. As an example, the hadoop FileOutputCommtter uses the option "mapreduce.fileoutputcommitter.cleanup-failures.ignored to choose what to do there

...and: what about cleanup in an abort job?

I think you'd be best off isolating this cleanup into its own method and call from both job commit & job abort, in job commit discuss with others what to do, and in job abort just log & continue

I think we should not delete the data when the task is aborted. The semantics of
descriptionWithJob should be to delete the data when the Job is commited.
I change code for handling exceptions.

SparkQA · 2018-05-15T13:50:43Z

Test build #90636 has finished for PR 21257 at commit 803b0a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-05-19T07:05:01Z

Test build #90826 has finished for PR 21257 at commit 6d496b1.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-05-21T21:28:43Z

Update the PR title?

gatorsmile · 2018-05-21T21:30:24Z

core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala

Nit: style issue. Can we follow the indents in the method declaration? https://github.com/databricks/scala-style-guide#spacing-and-indentation

SparkQA · 2018-05-22T16:05:03Z

Test build #90957 has finished for PR 21257 at commit f4f329c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…eing read from

SparkQA · 2018-06-21T11:55:30Z

Test build #92165 has finished for PR 21257 at commit 89599e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2018-06-21T15:15:51Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

+      for (path <- pathsToDelete(fs)) {
+        try {
+          if (!fs.delete(path, true)) {
+            logWarning(s"Delete path ${path} fail at job commit time")


delete -> false just means there was nothing there, I wouldn't warn at that point. Unless delete() throws an exception you assume that when the call returns, fs.exists(path) does not hold -regardless of the return value. (Special exception, the dest is "/")

steveloughran · 2018-06-21T15:16:41Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

+        } catch {
+          case ex: IOException =>
+            throw new IOException(s"Unable to clear output " +
+                s"file ${path} at job commit time", ex)


recommend including ex.toString() in the new exception raised, as child exception text can often get lost

steveloughran · 2018-06-21T15:20:47Z

...ain/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala

+          while (files.hasNext) {
+            val file = files.next()
+            if (!committer.deleteWithJob(fs, file.getPath, false)) {
+              throw new IOException(s"Unable to clear output " +


as committer.deleteWithJob() returns true in base class, that check won't do much, at least not with the default impl. Probably better just to have deleteWithJob() return Unit, require callers to raise an exception on a delete failure. Given that delete() is required to say "dest doesn't exist if you return", I don't think they need to do any checks at all

steveloughran · 2018-06-21T15:24:24Z

...ain/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala

+    if (fs.exists(staticPrefixPath)) {
+      if (staticPartitionPrefix.isEmpty && outputCheck) {
+        // input contain output, only delete output sub files when job commit
+          val files = fs.listFiles(staticPrefixPath, false)


if there are a lot of files here, you've gone from a dir delete which was O(1) on a fileystem, probably O(descendant) on an object store to at O(children) on an FS, O(children * descendants (chlld)) op here. Not significant for a small number of files, but could potentially be expensive. Why do the iteration at all?

steveloughran · 2018-06-21T15:24:57Z

...ain/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala

+          }
+      } else {
+        if (!committer.deleteWithJob(fs, staticPrefixPath, true)) {
+          throw new IOException(s"Unable to clear output " +


again, hard to see how this exception path would be reached.

steveloughran · 2018-06-21T15:26:48Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

+  /**
+   * now just record the file to be delete
+   */
+  override def deleteWithJob(fs: FileSystem, path: Path,


No need to worry about concurrent access here, correct?

steveloughran · 2018-06-21T15:32:15Z

some overall thought

I think this is only happening on a successful job commit, not abort. This is the desired action?
if something goes wrong here, is failing the entire job the correct action? If the deletes were happening earlier, then yes, the job would obviously fail. But now the core work has taken place, it's just cleanup failing. Which could be: permissions, transient network, etc.

I'll have to look a bit closer at what happens in committer cleanups right now, though as they are focused on rm -f $dest/__temporary/$jobAttempt, they are less worried about failures here as it shoudn't be changing any public datasets

AmplabJenkins · 2018-09-27T13:07:53Z

Can one of the admins verify this patch?

HyukjinKwon · 2018-11-11T03:38:39Z

ping @zheh12 to address comments. I am going to suggest to close this for now while I am identifying PRs to close now.

Closes apache#21766 Closes apache#21679 Closes apache#21161 Closes apache#20846 Closes apache#19434 Closes apache#18080 Closes apache#17648 Closes apache#17169 Add: Closes apache#22813 Closes apache#21994 Closes apache#22005 Closes apache#22463 Add: Closes apache#15899 Add: Closes apache#22539 Closes apache#21868 Closes apache#21514 Closes apache#21402 Closes apache#21322 Closes apache#21257 Closes apache#20163 Closes apache#19691 Closes apache#18697 Closes apache#18636 Closes apache#17176 Closes apache#23001 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>

zheh12 changed the title ~~[SPARK-24194] HadoopFsRelation cannot overwrite a path that is also b…~~ [SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a path that is also b… May 7, 2018

zheh12 force-pushed the SPARK-24194 branch from 16bdec7 to 797d559 Compare May 7, 2018 11:51

zheh12 changed the title ~~[SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a path that is also b…~~ [SPARK-24194] [SQL]HadoopFsRelation Cannot overwrite a path that is also being read from. May 7, 2018

zheh12 changed the title ~~[SPARK-24194] [SQL]HadoopFsRelation Cannot overwrite a path that is also being read from.~~ [SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a path that is also being read from. May 7, 2018

cloud-fan reviewed May 7, 2018

View reviewed changes

zheh12 force-pushed the SPARK-24194 branch from 797d559 to 37b43ed Compare May 8, 2018 02:07

cloud-fan reviewed May 8, 2018

View reviewed changes

zheh12 force-pushed the SPARK-24194 branch 2 times, most recently from 19e6692 to a51620b Compare May 9, 2018 05:44

zheh12 mentioned this pull request May 10, 2018

[SPARK-24238][SQL] HadoopFsRelation can't append the same table with multi job at the same time #21286

Closed

cloud-fan reviewed May 14, 2018

View reviewed changes

steveloughran reviewed May 14, 2018

View reviewed changes

steveloughran reviewed May 15, 2018

View reviewed changes

gatorsmile reviewed May 21, 2018

View reviewed changes

[SPARK-24194] HadoopFsRelation cannot overwrite a path that is also b…

89599e6

…eing read from

zheh12 force-pushed the SPARK-24194 branch from f4f329c to 89599e6 Compare June 21, 2018 08:04

steveloughran reviewed Jun 21, 2018

View reviewed changes

HyukjinKwon mentioned this pull request Nov 11, 2018

[INFRA] Close stale PRs #23001

Closed

asfgit closed this in a3ba3a8 Nov 11, 2018

[SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a path that is also being read from. #21257

[SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a path that is also being read from. #21257

Uh oh!

Conversation

zheh12 commented May 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

zheh12 commented May 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 8, 2018

Uh oh!

cloud-fan commented May 9, 2018

Uh oh!

SparkQA commented May 9, 2018

Uh oh!

zheh12 commented May 9, 2018

Uh oh!

zheh12 commented May 9, 2018

Uh oh!

cloud-fan commented May 9, 2018

Uh oh!

SparkQA commented May 9, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 14, 2018

Uh oh!

SparkQA commented May 15, 2018

Uh oh!

HyukjinKwon commented May 15, 2018

Uh oh!

zheh12 commented May 7, 2018 •

edited

Loading

zheh12 commented May 7, 2018 •

edited

Loading