Skip to content

Commit 37f8e28

Browse files
committed
HADOOP-17244: update compatibility docs
Make clear that dir markers and S3Guard is not compatible with older releases. Which means: they shouldn't be doing it. Change-Id: I3411b8354d30cfb76080b8cc904fdfbd2172b4d8
1 parent 12d8d03 commit 37f8e28

File tree

1 file changed

+59
-7
lines changed

1 file changed

+59
-7
lines changed

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md

Lines changed: 59 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,11 @@
1212
limitations under the License. See accompanying LICENSE file.
1313
-->
1414

15-
# Controlling the S3A Directory Marker Behavior
15+
# Experimental: Controlling the S3A Directory Marker Behavior
16+
17+
This document discusses an experimental feature of the S3A
18+
connector since Hadoop 3.3.1: the ability to retain directory
19+
marker objects above paths containing files or subdirectories.
1620

1721
## <a name="compatibility"></a> Critical: this is not backwards compatible!
1822

@@ -26,15 +30,40 @@ Versions of Hadoop which are incompatible with other marker retention policies,
2630
as of August 2020.
2731

2832
-------------------------------------------------------
29-
| Branch | Compatible Since | Future Fix Planned? |
33+
| Branch | Compatible Since | Supported |
3034
|------------|------------------|---------------------|
31-
| Hadoop 2.x | | NO |
32-
| Hadoop 3.0 | | NO |
33-
| Hadoop 3.1 | check | Yes |
34-
| Hadoop 3.2 | check | Yes |
35+
| Hadoop 2.x | n/a | WONTFIX |
36+
| Hadoop 3.0 | check | Read-only |
37+
| Hadoop 3.1 | check | Read-only |
38+
| Hadoop 3.2 | check | Read-only |
3539
| Hadoop 3.3 | 3.3.1 | Done |
3640
-------------------------------------------------------
3741

42+
*WONTFIX*
43+
44+
The Hadoop branch-2 line will *not* be patched.
45+
46+
*Read-only*
47+
48+
These branches have read-only compatibility.
49+
50+
* They may list directories with directory markers, and correctly identify when
51+
such directories have child entries.
52+
* They will open files under directories with such markers.
53+
54+
However, they have limitations when writing/deleting directories.
55+
56+
Specifically: S3Guard tables may not be correctly updated in
57+
all conditions, especially on the partial failure of delete
58+
operations. Specifically: they may mistakenly add a tombstone in
59+
the dynamoDB table and so future directory/directory tree listings
60+
will consider the directory to be nonexistent.
61+
62+
_It is not safe for Hadoop releases before Hadoop 3.3.1 to write
63+
to S3 buckets which have directory markers when S3Guard is enabled_
64+
65+
## Verifying read compatibility.
66+
3867
The `s3guard bucket-info` tool [can be used to verify support](#bucket-info).
3968
This allows for a command line check of compatibility, including
4069
in scripts.
@@ -49,6 +78,7 @@ It is only safe change the directory marker policy if the following
4978
(including backing up) an S3 bucket.
5079
2. You know all applications which read data from the bucket are compatible.
5180

81+
5282
### <a name="backups"></a> Applications backing up data.
5383

5484
It is not enough to have a version of Apache Hadoop which is compatible, any
@@ -240,7 +270,7 @@ can switch to the higher-performance mode for those specific directories.
240270
Only the default setting, `fs.s3a.directory.marker.retention = delete` is compatible with
241271
every shipping Hadoop releases.
242272

243-
## <a name="authoritative"></a> Directory Markers and S3Guard
273+
## <a name="s3guard"></a> Directory Markers and S3Guard
244274

245275
Applications which interact with S3A in S3A clients with S3Guard enabled still
246276
create and delete markers. There's no attempt to skip operations, such as by having
@@ -256,6 +286,28 @@ then an S3A connector with a retention policy of `fs.s3a.directory.marker.retent
256286
only use in managed applications where all clients are using the same version of
257287
hadoop, and configured consistently.
258288

289+
After the directory marker feature [HADOOP-13230](https://issues.apache.org/jira/browse/HADOOP-13230)
290+
was added, issues related to S3Guard integration surfaced:
291+
292+
1. The incremental update of the S3Guard table was inserting tombstones
293+
over directories as the markers were deleted, hiding files underneath.
294+
This happened during directory `rename()` and `delete()`.
295+
1. The update of the S3Guard table after a partial failure of a bulk delete
296+
operation would insert tombstones in S3Guard records of successfully
297+
deleted markers, irrespective of the directory status.
298+
299+
Issue #1 is unique to Hadoop branch 3.3; however issue #2 is s critical
300+
part of the S3Guard consistency handling.
301+
302+
Both issues have been fixed in Hadoop 3.3.x,
303+
in [HADOOP-17244](https://issues.apache.org/jira/browse/HADOOP-17244)
304+
305+
Issue #2, delete failure handling, is not easily backported and is
306+
not likely to be backported.
307+
308+
Accordingly: Hadoop releases with read-only compatibility must not be used
309+
to rename or delete directories where markers are retained *when S3Guard is enabled.*
310+
259311
## <a name="bucket-info"></a> Verifying marker policy with `s3guard bucket-info`
260312

261313
The `bucket-info` command has been enhanced to support verification from the command

0 commit comments

Comments
 (0)