Skip to content

Commit c31ed5e

Browse files
committed
HADOOP-13786 cost of v2 commit
1 parent cf6f115 commit c31ed5e

File tree

1 file changed

+10
-7
lines changed

1 file changed

+10
-7
lines changed

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3a_committer_architecture.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,10 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
243243

244244
On a genuine fileystem this is an `O(1)` directory rename.
245245

246+
On an object store with a mimiced rename, it is `O(data)` for the copy,
247+
along with overhead for listing and deleting all files (For S3, that's
248+
`(1 + files/500)` lists, and the same number of delete calls.
249+
246250

247251
#### Task Abort
248252

@@ -253,7 +257,8 @@ def abortTask(fs, jobAttemptPath, taskAttemptPath, dest):
253257
fs.delete(taskAttemptPath, recursive=True)
254258
```
255259

256-
On a genuine fileystem this is an `O(1)` operation.
260+
On a genuine fileystem this is an `O(1)` operation. On an object store,
261+
proportional to the time to list and delete files, usually in batches.
257262

258263

259264
#### Job Commit
@@ -447,8 +452,6 @@ must be considered significantly misleading.
447452

448453
Rename task attempt path to task committed path.
449454

450-
(On a genuine fileystem this is `O(1)`)
451-
452455
```python
453456

454457
def needsTaskCommit(fs, jobAttemptPath, taskAttemptPath, dest):
@@ -460,10 +463,6 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
460463

461464
```
462465

463-
Cost in a conventional filesystem: `O(files)`.
464-
465-
Cost against an object store with mimiced rename, `O(data) + O(files)`.
466-
467466
#### v2 Task Abort
468467

469468
Delete task attempt path.
@@ -473,6 +472,8 @@ def abortTask(fs, jobAttemptPath, taskAttemptPath, dest):
473472
fs.delete(taskAttemptPath, recursive=True)
474473
```
475474

475+
Cost: O(1) for normal filesystems, O(files) for object stores.
476+
476477

477478
#### v2 Job Commit
478479

@@ -501,6 +502,8 @@ def abortJob(fs, jobAttemptDir, dest):
501502
fs.delete(jobAttemptDir, recursive=True)
502503
```
503504

505+
Cost: O(1) for normal filesystems, O(files) for object stores.
506+
504507
#### v2 Task Recovery
505508

506509
As no data is written to the destination directory, a task can be cleaned up

0 commit comments

Comments
 (0)