1212 limitations under the License. See accompanying LICENSE file.
1313-->
1414
15- # Controlling the S3A Directory Marker Behavior
15+ # Experimental: Controlling the S3A Directory Marker Behavior
16+
17+ This document discusses an experimental feature of the S3A
18+ connector since Hadoop 3.3.1: the ability to retain directory
19+ marker objects above paths containing files or subdirectories.
1620
1721## <a name =" compatibility " ></a > Critical: this is not backwards compatible!
1822
@@ -26,15 +30,40 @@ Versions of Hadoop which are incompatible with other marker retention policies,
2630as of August 2020.
2731
2832-------------------------------------------------------
29- | Branch | Compatible Since | Future Fix Planned? |
33+ | Branch | Compatible Since | Supported |
3034| ------------| ------------------| ---------------------|
31- | Hadoop 2.x | | NO |
32- | Hadoop 3.0 | | NO |
33- | Hadoop 3.1 | check | Yes |
34- | Hadoop 3.2 | check | Yes |
35+ | Hadoop 2.x | n/a | WONTFIX |
36+ | Hadoop 3.0 | check | Read-only |
37+ | Hadoop 3.1 | check | Read-only |
38+ | Hadoop 3.2 | check | Read-only |
3539| Hadoop 3.3 | 3.3.1 | Done |
3640-------------------------------------------------------
3741
42+ * WONTFIX*
43+
44+ The Hadoop branch-2 line will * not* be patched.
45+
46+ * Read-only*
47+
48+ These branches have read-only compatibility.
49+
50+ * They may list directories with directory markers, and correctly identify when
51+ such directories have child entries.
52+ * They will open files under directories with such markers.
53+
54+ However, they have limitations when writing/deleting directories.
55+
56+ Specifically: S3Guard tables may not be correctly updated in
57+ all conditions, especially on the partial failure of delete
58+ operations. Specifically: they may mistakenly add a tombstone in
59+ the dynamoDB table and so future directory/directory tree listings
60+ will consider the directory to be nonexistent.
61+
62+ _ It is not safe for Hadoop releases before Hadoop 3.3.1 to write
63+ to S3 buckets which have directory markers when S3Guard is enabled_
64+
65+ ## Verifying read compatibility.
66+
3867The ` s3guard bucket-info ` tool [ can be used to verify support] ( #bucket-info ) .
3968This allows for a command line check of compatibility, including
4069in scripts.
@@ -49,6 +78,7 @@ It is only safe change the directory marker policy if the following
4978 (including backing up) an S3 bucket.
50792 . You know all applications which read data from the bucket are compatible.
5180
81+
5282### <a name =" backups " ></a > Applications backing up data.
5383
5484It is not enough to have a version of Apache Hadoop which is compatible, any
@@ -240,7 +270,7 @@ can switch to the higher-performance mode for those specific directories.
240270Only the default setting, ` fs.s3a.directory.marker.retention = delete ` is compatible with
241271every shipping Hadoop releases.
242272
243- ## <a name =" authoritative " ></a > Directory Markers and S3Guard
273+ ## <a name =" s3guard " ></a > Directory Markers and S3Guard
244274
245275Applications which interact with S3A in S3A clients with S3Guard enabled still
246276create and delete markers. There's no attempt to skip operations, such as by having
@@ -256,6 +286,28 @@ then an S3A connector with a retention policy of `fs.s3a.directory.marker.retent
256286only use in managed applications where all clients are using the same version of
257287hadoop, and configured consistently.
258288
289+ After the directory marker feature [ HADOOP-13230] ( https://issues.apache.org/jira/browse/HADOOP-13230 )
290+ was added, issues related to S3Guard integration surfaced:
291+
292+ 1 . The incremental update of the S3Guard table was inserting tombstones
293+ over directories as the markers were deleted, hiding files underneath.
294+ This happened during directory ` rename() ` and ` delete() ` .
295+ 1 . The update of the S3Guard table after a partial failure of a bulk delete
296+ operation would insert tombstones in S3Guard records of successfully
297+ deleted markers, irrespective of the directory status.
298+
299+ Issue #1 is unique to Hadoop branch 3.3; however issue #2 is s critical
300+ part of the S3Guard consistency handling.
301+
302+ Both issues have been fixed in Hadoop 3.3.x,
303+ in [ HADOOP-17244] ( https://issues.apache.org/jira/browse/HADOOP-17244 )
304+
305+ Issue #2 , delete failure handling, is not easily backported and is
306+ not likely to be backported.
307+
308+ Accordingly: Hadoop releases with read-only compatibility must not be used
309+ to rename or delete directories where markers are retained * when S3Guard is enabled.*
310+
259311## <a name =" bucket-info " ></a > Verifying marker policy with ` s3guard bucket-info `
260312
261313The ` bucket-info ` command has been enhanced to support verification from the command
0 commit comments