@@ -243,6 +243,10 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
243243
244244On a genuine fileystem this is an ` O(1) ` directory rename.
245245
246+ On an object store with a mimiced rename, it is ` O(data) ` for the copy,
247+ along with overhead for listing and deleting all files (For S3, that's
248+ ` (1 + files/500) ` lists, and the same number of delete calls.
249+
246250
247251#### Task Abort
248252
@@ -253,7 +257,8 @@ def abortTask(fs, jobAttemptPath, taskAttemptPath, dest):
253257 fs.delete(taskAttemptPath, recursive = True )
254258```
255259
256- On a genuine fileystem this is an ` O(1) ` operation.
260+ On a genuine fileystem this is an ` O(1) ` operation. On an object store,
261+ proportional to the time to list and delete files, usually in batches.
257262
258263
259264#### Job Commit
@@ -447,8 +452,6 @@ must be considered significantly misleading.
447452
448453Rename task attempt path to task committed path.
449454
450- (On a genuine fileystem this is ` O(1) ` )
451-
452455``` python
453456
454457def needsTaskCommit (fs , jobAttemptPath , taskAttemptPath , dest ):
@@ -460,10 +463,6 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
460463
461464```
462465
463- Cost in a conventional filesystem: ` O(files) ` .
464-
465- Cost against an object store with mimiced rename, ` O(data) + O(files) ` .
466-
467466#### v2 Task Abort
468467
469468Delete task attempt path.
@@ -473,6 +472,8 @@ def abortTask(fs, jobAttemptPath, taskAttemptPath, dest):
473472 fs.delete(taskAttemptPath, recursive = True )
474473```
475474
475+ Cost: O(1) for normal filesystems, O(files) for object stores.
476+
476477
477478#### v2 Job Commit
478479
@@ -501,6 +502,8 @@ def abortJob(fs, jobAttemptDir, dest):
501502 fs.delete(jobAttemptDir, recursive = True )
502503```
503504
505+ Cost: O(1) for normal filesystems, O(files) for object stores.
506+
504507#### v2 Task Recovery
505508
506509As no data is written to the destination directory, a task can be cleaned up
0 commit comments