Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) #1208

Merged
merged 16 commits into from
Sep 12, 2019

Conversation

bgaborg
Copy link

@bgaborg bgaborg commented Aug 1, 2019

No description provided.

@bgaborg bgaborg changed the title HADOOP-16423. S3Guarld fsck: Check metadata consistency between S3 and metadatastore (log) HADOOP-16423. S3Guarld fsck: Check metadata consistency between S3 and metadatastore (log) (WIP) Aug 1, 2019
@steveloughran steveloughran changed the title HADOOP-16423. S3Guarld fsck: Check metadata consistency between S3 and metadatastore (log) (WIP) HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) (WIP) Aug 1, 2019
@steveloughran steveloughran self-requested a review August 1, 2019 17:13
@steveloughran steveloughran added enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint labels Aug 1, 2019
@bgaborg bgaborg added the work in progress PRs still Work in Progress; reviews not expected but still welcome label Aug 2, 2019
@bgaborg
Copy link
Author

bgaborg commented Aug 2, 2019

Missing things:

  • Add AUTHORITATIVE_DIRECTORY_CONTENT_MISMATCH check and test. It should detect if an authoritative directory listing content is not equal to what is on in S3 (S3 is the source of throught)
  • Wire VersionID check flag in from CLI / turn versionID checking off altogether and remove check and test for it
  • Remove blockSize check since we don't store it in dynamo, and makes no sense in S3
  • Remove owner check since we don't store it in dynamo
  • Wire root path parameter from CLI, so the root path of the scan can be defined within the bucket.

@bgaborg bgaborg changed the title HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) (WIP) HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) Aug 13, 2019
@bgaborg
Copy link
Author

bgaborg commented Aug 13, 2019

  • - Fixed most of the checkstyle issues, however I left the long line length as it is because I think it's more readable this way.

  • - Fixed findbugs issues.

  • - Added more, and fixed comments.

  • - Created several subtasks in jira to add more features to this improvement.

  • Please review!

@steveloughran
Copy link
Contributor

Overall

there are checks, but as it doesnt recurse from me it's hard to valide them.

The UX can be improved. I propose:

  • for successful entries, print their details as they are processed, such as length and etag.
  • failure to initialize the fs to include the error.
  • print the total duration of the check, number of entries scanned.

operations

failed on root entry.

bin/hadoop s3guard fsck -check s3a://guarded-table/
2019-08-22 14:17:37,973 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
== Path: s3a://guarded-table/
2019-08-22 14:17:38,160 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(220)) - Entry is in the root, so there's no parent
== Path: s3a://guarded-table/example
2019-08-22 14:17:38,189 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - 
On path: s3a://guarded-table/
No etag.

with a path, I got the same message twice

bin/hadoop s3guard fsck -check s3a://guarded-table/example
2019-08-22 14:19:26,674 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
== Path: s3a://guarded-table/example
== Path: s3a://guarded-table/example

missing file.

 bin/hadoop s3guard fsck -check s3a://guarded-table/example/missing
2019-08-22 14:21:23,252 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
java.io.FileNotFoundException: No such file or directory: s3a://guarded-table/example/missing
2019-08-22 14:21:23,404 [main] INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 44: java.io.FileNotFoundException: No such file or directory: s3a://guarded-table/example/missing

This is good. Add a test for it.

s3a://bucket/.. This is bad. Add test and then fix.

bin/hadoop s3guard fsck -check s3a://guarded-table/..
2019-08-22 14:23:14,640 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://guarded-table/..: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:400 Invalid URI: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:237)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:164)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2732)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2694)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2587)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardFsck.compareS3RootToMs(S3GuardFsck.java:94)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$Fsck.run(S3GuardTool.java:1560)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:402)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1759)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:1768)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4920)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4866)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1320)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$5(S3AFileSystem.java:1623)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:406)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:369)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1617)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1593)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2715)
	... 8 more
2019-08-22 14:23:14,677 [main] INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status -1: org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://guarded-table/..: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:400 Invalid URI: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null)

a check of s3a://guarded-table/example/.. shows qualification is taking place; it is the root dir where things break.

unknown bucket. I'd like the failure message, and no need for the usage string as it is unrelated

 bin/hadoop s3guard fsck -check s3a://hwdev-ireland-new/
Failed to initialize S3AFileSystem from path: s3a://hwdev-ireland-new/
fsck [OPTIONS] [s3a://BUCKET]
	Compares S3 with MetadataStore, and returns a failure status if any rules or invariants are violated. Only works with DynamoDbMetadataStore.

Common options:
  check Check the metadata store for errors, but do not fix any issues.

  • same for bin/hadoop s3guard fsck -check file://

For a missing path in the valid fs, a failure. ()

bin/hadoop s3guard fsck -check s3a:///hwdev-steve-ireland-new/etc/something
Failed to initialize S3AFileSystem from path: s3a:///hwdev-steve-ireland-new/etc/something
fsck [OPTIONS] [s3a://BUCKET]
	Compares S3 with MetadataStore, and returns a failure status if any rules or invariants are violated. Only works with DynamoDbMetadataStore.

Common options:
  check Check the metadata store for errors, but do not fix any issues.

2019-08-22 14:42:14,007 [main] INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status -1: 

I'd expect a check to flag the file is missing in s3guard and s3, and if there's a tombstone in s3 to catch that and consider it a check failure.

now, prepared a dir with bin/hadoop fs -copyFromLocal -t 8 etc s3a://hwdev-steve-ireland-new/
to create data

bin/hadoop fs -ls -R s3a://hwdev-steve-ireland-new/
drwxrwxrwx   - stevel stevel          0 2019-08-22 14:34 s3a://hwdev-steve-ireland-new/etc
drwxrwxrwx   - stevel stevel          0 2019-08-22 14:34 s3a://hwdev-steve-ireland-new/etc/hadoop
-rw-rw-rw-   1 stevel stevel       1351 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
-rw-rw-rw-   1 stevel stevel       3999 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-env.cmd
-rw-rw-rw-   1 stevel stevel        118 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/tokens-exclude-aws-secrets.xml
-rw-rw-rw-   1 stevel stevel        620 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-site.xml
-rw-rw-rw-   1 stevel stevel       3823 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/log4j.properties~
-rw-rw-rw-   1 stevel stevel       2316 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/ssl-client.xml.example
-rw-rw-rw-   1 stevel stevel       6113 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarn-env.sh
-rw-rw-rw-   1 stevel stevel      11765 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-policy.xml
-rw-rw-rw-   1 stevel stevel       3321 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-metrics2.properties
-rw-rw-rw-   1 stevel stevel       3414 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-user-functions.sh.example
-rw-rw-rw-   1 stevel stevel         10 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/workers
drwxrwxrwx   - stevel stevel          0 2019-08-22 14:34 s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d
-rw-rw-rw-   1 stevel stevel       3880 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d/example.sh
-rw-rw-rw-   1 stevel stevel        951 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-env.cmd
-rw-rw-rw-   1 stevel stevel        682 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-site.xml
-rw-rw-rw-   1 stevel stevel        683 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hdfs-rbf-site.xml
-rw-rw-rw-   1 stevel stevel        775 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hdfs-site.xml
-rw-rw-rw-   1 stevel stevel       2393 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/container-executor.cfg
-rw-rw-rw-   1 stevel stevel       1860 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-log4j.properties
-rw-rw-rw-   1 stevel stevel       1867 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/core-site.xml
-rw-rw-rw-   1 stevel stevel       1335 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/configuration.xsl
-rw-rw-rw-   1 stevel stevel       2697 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/ssl-server.xml.example
-rw-rw-rw-   1 stevel stevel        758 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-site.xml
-rw-rw-rw-   1 stevel stevel       1484 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-env.sh
-rw-rw-rw-   1 stevel stevel       8260 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
-rw-rw-rw-   1 stevel stevel       6858 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/log4j.properties
-rw-rw-rw-   1 stevel stevel       2681 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/user_ec_policies.xml.template
-rw-rw-rw-   1 stevel stevel       2250 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarn-env.cmd
-rw-rw-rw-   1 stevel stevel       2591 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarnservice-log4j.properties
-rw-rw-rw-   1 stevel stevel       1657 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-log4j.properties
-rw-rw-rw-   1 stevel stevel       1764 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-env.sh
-rw-rw-rw-   1 stevel stevel       3518 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-acls.xml
-rw-rw-rw-   1 stevel stevel        690 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarn-site.xml
-rw-rw-rw-   1 stevel stevel       4113 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-queues.xml.template
-rw-rw-rw-   1 stevel stevel      16948 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-env.sh

on both root and etc/ it failed with a message about etags. these are directories. etags should not be needed.

bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/
2019-08-22 14:34:25,915 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/
2019-08-22 14:34:26,615 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(220)) - Entry is in the root, so there's no parent
2019-08-22 14:34:26,623 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - 
On path: s3a://hwdev-steve-ireland-new/
No etag.

~/P/R/fsck bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc
2019-08-22 14:34:39,682 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc
2019-08-22 14:34:40,140 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - 
On path: s3a://hwdev-steve-ireland-new/etc
No etag.

Valid file is good. I think we can/should print size, timestamp, etag and version, for more info

 bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 14:45:25,752 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh

now purge the ddb table and repeat

Prune didn't work. this looks like a prune problem as it does the same for tombstones and DDB shows a lot of them. And I'd have expected to see some debug level logging.

bin/hadoop s3guard prune -seconds 0 s3a://hwdev-steve-ireland-new/
2019-08-22 14:47:56,314 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
2019-08-22 14:47:56,341 [main] INFO  s3guard.DynamoDBMetadataStore (DurationInfo.java:<init>(72)) - Starting: Pruning DynamoDB Store
2019-08-22 14:47:56,395 [main] INFO  s3guard.DynamoDBMetadataStore (DurationInfo.java:close(87)) - Pruning DynamoDB Store: duration 0:00.054s
2019-08-22 14:47:56,395 [main] INFO  s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1576)) - Finished pruning 0 items in batches of 25

Manually delete from the AWS console and fsck the file

 bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:04:11,635 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:04:11,951 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
No PathMetadata for this path in the MS.

2019-08-22 15:04:11,951 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
No PathMetadata for this path in the MS.
  1. yes, it found the problem
  2. but it reported/found it twice.
  3. and didn't cover what was in S3 (e.g. S3 contains a file of size...)

recover into ddb then retry. now fails on version id mismatch

bin/hadoop fs -ls -R s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:08:28,868 [main] DEBUG s3guard.Operations (DynamoDBMetadataStore.java:logPut(2403)) - #(Put-0001) TOMBSTONE s3a:///hwdev-steve-ireland-new/etc
2019-08-22 15:08:28,870 [main] DEBUG s3guard.Operations (DynamoDBMetadataStore.java:logPut(2403)) - #(Put-0001) TOMBSTONE s3a:///hwdev-steve-ireland-new/etc/hadoop
2019-08-22 15:08:28,870 [main] DEBUG s3guard.Operations (DynamoDBMetadataStore.java:logPut(2403)) - #(Put-0001) TOMBSTONE s3a:///hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
-rw-rw-rw-   1 stevel stevel       1351 2019-08-22 14:39 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
~/P/R/fsck bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:08:49,920 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:08:50,222 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
getVersionId mismatch - s3: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde, ms: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde

2019-08-22 15:08:50,223 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
getVersionId mismatch - s3: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde, ms: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde

the two IDs match; I don't know what the problem is. And if it was a real failure, I'd like to know the rest of the details about the entry.
looking at DDB, this is the first table entry with a versionId. Maybe the check is wrong.

@apache apache deleted a comment from hadoop-yetus Sep 4, 2019
@apache apache deleted a comment from hadoop-yetus Sep 4, 2019
@bgaborg bgaborg removed the work in progress PRs still Work in Progress; reviews not expected but still welcome label Sep 5, 2019
@steveloughran
Copy link
Contributor

steveloughran commented Sep 6, 2019

working on the CLI, but overreporting errors, especially on versioning

Scan of a dir did work, but it overreacts to

  • no etag on a directory entry
  • version Id mismatch

Reports no etag on directory entries where we don't expect one:

2019-09-06 13:46:33,571 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop
No etag.

On a scan of a tree it reports version ID mismatches where s3 == null:


2019-09-06 13:46:33,572 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
getModificationTime mismatch - s3: 1567773093000, ms: 1567773092204
getVersionId mismatch - s3: null, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t

The ddb table has the version ID, but I'm assuming that the scan doesn't get them from S3 because we'd need to use HEAD over LIST.

When I give the full path s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml it says there's a mismatch but now prints the same value on both sides. This is not a mismatch and should not appear.

~/P/R/fsck bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
2019-09-06 13:59:52,857 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
2019-09-06 13:59:53,057 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e
2019-09-06 13:59:53,115 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e
2019-09-06 13:59:53,142 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
getModificationTime mismatch - s3: 1567773093000, ms: 1567773092204
getVersionId mismatch - s3: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t

2019-09-06 13:59:53,142 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
getModificationTime mismatch - s3: 1567773093000, ms: 1567773092204
getVersionId mismatch - s3: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t

2019-09-06 13:59:53,142 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(144)) - Total scan time: 0s
2019-09-06 13:59:53,142 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(145)) - Scanned entries: 2

Note also that the file gets scanned twice. This hints at the scanning playing up when the supplied path is a file, not a dir.

Now I open the file with hadoop fs -cat s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml; there's a PUT to the DDB table as the modtime is updated; the next scan doesn't report modtime issues, but it does still mistakenly report the version IDs are different.

bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
2019-09-06 14:33:59,582 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
2019-09-06 14:33:59,773 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e
2019-09-06 14:33:59,828 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e
2019-09-06 14:33:59,856 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
getVersionId mismatch - s3: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t

2019-09-06 14:33:59,856 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
getVersionId mismatch - s3: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t

2019-09-06 14:33:59,856 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(144)) - Total scan time: 0s
2019-09-06 14:33:59,856 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(145)) - Scanned entries: 2

side issue: what to do when the path supplied is for a file which has a tombstone in DDB and no file? Currently it's FNFE

bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-site.xml._COPYING_
2019-09-06 13:33:14,788 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
java.io.FileNotFoundException: No such file or directory: s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-site.xml._COPYING_
2019-09-06 13:33:14,890 [main] INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 44: java.io.FileNotFoundException: No such file or directory: s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-site.xml._COPYING_

Are we confident that this command will do a check if there is a file in S3 but tombstoned in MS?

@steveloughran
Copy link
Contributor

OK, latest patch is better.

  • still warns of no etag on a dir;
  • when you pass a path to a file it is scanned twice
  • I think we need to be able to disable the modtime checks, because you tend to get them whenever you create an entry after writing a file (system clock is used); they get updated on the first read. Or: we allow a range of accuracy?

@steveloughran
Copy link
Contributor

e.g

bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d
2019-09-06 14:51:46,377 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
2019-09-06 14:51:46,705 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d - Length S3: 0, MS: 0 - Etag S3: null, MS: null
2019-09-06 14:51:46,764 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d/example.sh - Length S3: 3880, MS: 3880 - Etag S3: c7dbe1b877a287175df9dfc32c226765, MS: c7dbe1b877a287175df9dfc32c226765
2019-09-06 14:51:46,792 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d
No etag.

2019-09-06 14:51:46,792 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d/example.sh
getModificationTime mismatch - s3: 1567773091000, ms: 1567773090841

2019-09-06 14:51:46,792 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(144)) - Total scan time: 0s
2019-09-06 14:51:46,792 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(145)) - Scanned entries: 2
~/P/R/fsck echo $status
0

The good news: the return code is 0; it passed the scan. So these are just info rather than warn

@steveloughran
Copy link
Contributor

Did another test run on an unversioned bucket where the the DDB table was built up with an ls -R, so filled up straight from S3. all checks happy (e.g modtime) but still warning of null etags on all directories, including the root one.

2019-09-06 14:59:23,893 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-london/
No etag.

@steveloughran
Copy link
Contributor

got a test run failure in a new test

testCLIFsckWithParam(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)  Time elapsed: 8.028 s  <<< ERROR!
java.io.FileNotFoundException: No such file or directory: s3a://hwdev-steve-ireland-new/fork-0004/test
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2788)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2677)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2571)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:2360)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$10(S3AFileSystem.java:2339)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2339)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardFsck.compareS3ToMs(S3GuardFsck.java:115)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$Fsck.run(S3GuardTool.java:1560)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:402)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1763)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.run(AbstractS3GuardToolTestBase.java:137)
	at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB.testCLIFsckWithParam(ITestS3GuardToolDynamoDB.java:301)

@steveloughran
Copy link
Contributor

and

[ERROR] Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.342 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
[ERROR] testIVersionIdMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 1.591 s  <<< FAILURE!
java.lang.AssertionError: 
[Violations in the childPair] 
Expecting:
 <[ETAG_MISMATCH, LENGTH_MISMATCH, MOD_TIME_MISMATCH]>
to contain:
 <[VERSIONID_MISMATCH]>
but could not find:
 <[VERSIONID_MISMATCH]>

	at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIVersionIdMismatch(ITestS3GuardFsck.java:589)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran
Copy link
Contributor

The testCLIFsckWithParam test works standalone. Looks to me like a race condition -the fsck is being performed on an active bucket and files have been deleted between listed and queued for scanning and the actual scan.

  1. Test should only scan the test directory, or we handle FNFEs as something to ignore
  2. Could the fsck code itself change here? Because on a stable bucket the FNFE could be a sign of a mismatch from DDB to store; you don't want them ignored.

But: the failure could be delayed and reported as a missing file, rather than triggering a fast failure.

@steveloughran
Copy link
Contributor

steveloughran commented Sep 6, 2019

OK, latest review is good with etags; modtime is something we can worry about as an extra iteration. Tested on a store which is set up for auth listings and is clearly considered inconsistent. It warns me of this, maybe in too much detail.

...
2019-09-06 15:51:25,549 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - 
On path: s3a://hwdev-steve-london/fork-0006/test/ITestS3AContractDistCp/testTrackDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir
The content of an authoritative directory listing does not match the content of the S3 listing. S3: [[S3AFileStatus{path=s3a://hwdev-steve-london/fork-0006/test/ITestS3AContractDistCp/testTrackDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir1; isDirectory=true; modification_time=0; access_time=0; owner=stevel; group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=null versionId=null, S3AFileStatus{path=s3a://hwdev-steve-london/fork-0006/test/ITestS3AContractDistCp/testTrackDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir2; isDirectory=true; modification_time=0; access_time=0; owner=stevel; group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=null versionId=null]], MS: []

2019-09-06 15:51:25,549 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(144)) - Total scan time: 3s
2019-09-06 15:51:25,549 [main] INFO  s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(145)) - Scanned entries: 51
~/P/R/fsck echo $status
0

It's noisy, but, well, that could be tuned a bit by cutting back on the amount of the S3AFileStatus fields to print. What is clear is: it found real problems where the DDB was incomplete.

But: exit code was still 0. I think we should return something in the case where there is a mismatch of this scale.

BTW, a hadoop fs -ls -R s3a://hwdev-steve-london/ call did actually trigger a list and import of the data, that is, it went to the FS anyway. I'm not sure I understand auth mode properly. That detail aside, after the listing the store was consistent again and all was good

@steveloughran
Copy link
Contributor

Overall then, last iteration has a working CLI

  • One of the tests is brittle in parallel runs
  • fsck must return an error code on a failure for scripts and tests
  • modtime handling needs to be tuned (followup?)
  • no docs that I can see

@apache apache deleted a comment from hadoop-yetus Sep 6, 2019
@apache apache deleted a comment from hadoop-yetus Sep 10, 2019
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 53 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1040 trunk passed
+1 compile 35 trunk passed
+1 checkstyle 26 trunk passed
+1 mvnsite 40 trunk passed
+1 shadedclient 753 branch has no errors when building and testing our client artifacts.
+1 javadoc 28 trunk passed
0 spotbugs 57 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 56 trunk passed
_ Patch Compile Tests _
+1 mvninstall 33 the patch passed
+1 compile 28 the patch passed
+1 javac 28 the patch passed
-0 checkstyle 20 hadoop-tools/hadoop-aws: The patch generated 12 new + 25 unchanged - 0 fixed = 37 total (was 25)
+1 mvnsite 32 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 762 patch has no errors when building and testing our client artifacts.
+1 javadoc 26 the patch passed
+1 findbugs 61 the patch passed
_ Other Tests _
+1 unit 84 hadoop-aws in the patch passed.
+1 asflicense 33 The patch does not generate ASF License warnings.
3214
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/22/artifact/out/Dockerfile
GITHUB PR #1208
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux bd120e7e39df 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / dc9abd2
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/22/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/22/testReport/
Max. process+thread count 438 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/22/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

Gabor Bota added 15 commits September 11, 2019 13:47
…sionID mismatch, added severity (not used enywhere yet);

Change-Id: I4e327bb172663b5da247789d19053e6d54e88a1e
…sing the class defined in the enum instead.

Change-Id: I2debc18e70af54ed08d0382bf42e0e11e3100603
Change-Id: Ifcaaf2ca2027e81f3be0dc1337b34aa315b8d5c1
…oolDynamoDB#testCLIFsckWithParam

Change-Id: I6bbb331b6c0a41c61043e482b95504fda8a50596
Change-Id: I611e7421ba061d1048bd6bb182f5238f810a400a
…ed because of readability.

Change-Id: I0660c181ec07e2c0addd906fdac41b92169283f5
Change-Id: I751b1520070836894b0667eff7861d0eb760a4a3
… and working. May need some more fine-tuning.

Change-Id: I11df42693da9911738dbc031e74f418d487b2460
Change-Id: I8330f9c562ab3e3335a2aea7a85446643ce4fa8c
Change-Id: Iaff8875a7ca238639c105537c3268bfb212189e2
Change-Id: Ife89007fdc028aa49abe0ed6441f95e08078688f
Change-Id: I2ce69d66e348c4c0aded9bc8cf273e0c3a44f580
Change-Id: Ib933b0cfee6fd5dd9da0a062b5f81c26e94d383c
Change-Id: I98da340813d826acdf21e13698942c9cde09f192
@steveloughran
Copy link
Contributor

last test run:

[ERROR] Failures: 
[ERROR]   ITestS3GuardFsck.testIDetectParentTombstoned:194->assertComparePairsSize:452 [Number of compare pairs] expected:<[1]> but was:<[2]>
[ERROR] Errors: 
[ERROR]   ITestS3GuardFsck.testIAuthoritativeDirectoryContentMismatch:292->checkForViolationInPairs:474 » NoSuchElement
[INFO] 
[INFO] Running org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat
[ERROR] Tests run: 12, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 30.746 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck
[ERROR] testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 8.026 s  <<< FAILURE!
org.junit.ComparisonFailure: [Number of compare pairs] expected:<[1]> but was:<[2]>
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.assertComparePairsSize(ITestS3GuardFsck.java:452)
	at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:194)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[ERROR] testIAuthoritativeDirectoryContentMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck)  Time elapsed: 4.626 s  <<< ERROR!
java.util.NoSuchElementException: No value present
	at java.util.Optional.get(Optional.java:135)
	at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.checkForViolationInPairs(ITestS3GuardFsck.java:474)
	at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIAuthoritativeDirectoryContentMismatch(ITestS3GuardFsck.java:292)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)


@steveloughran
Copy link
Contributor

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 48 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1372 trunk passed
+1 compile 37 trunk passed
+1 checkstyle 27 trunk passed
+1 mvnsite 40 trunk passed
+1 shadedclient 874 branch has no errors when building and testing our client artifacts.
+1 javadoc 27 trunk passed
0 spotbugs 70 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 67 trunk passed
_ Patch Compile Tests _
+1 mvninstall 38 the patch passed
+1 compile 31 the patch passed
+1 javac 31 the patch passed
-0 checkstyle 21 hadoop-tools/hadoop-aws: The patch generated 10 new + 25 unchanged - 0 fixed = 35 total (was 25)
+1 mvnsite 35 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 928 patch has no errors when building and testing our client artifacts.
+1 javadoc 27 the patch passed
+1 findbugs 72 the patch passed
_ Other Tests _
+1 unit 88 hadoop-aws in the patch passed.
+1 asflicense 33 The patch does not generate ASF License warnings.
3879
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/23/artifact/out/Dockerfile
GITHUB PR #1208
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 001b4dfb6411 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / c255333
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/23/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/23/testReport/
Max. process+thread count 412 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/23/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

Change-Id: Ic31cbd5925b92df6c421012a3b91497d16aa6bef
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 42 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1061 trunk passed
+1 compile 32 trunk passed
+1 checkstyle 24 trunk passed
+1 mvnsite 36 trunk passed
+1 shadedclient 719 branch has no errors when building and testing our client artifacts.
+1 javadoc 28 trunk passed
0 spotbugs 60 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 58 trunk passed
_ Patch Compile Tests _
+1 mvninstall 34 the patch passed
+1 compile 27 the patch passed
+1 javac 27 the patch passed
-0 checkstyle 20 hadoop-tools/hadoop-aws: The patch generated 10 new + 25 unchanged - 0 fixed = 35 total (was 25)
+1 mvnsite 33 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 746 patch has no errors when building and testing our client artifacts.
+1 javadoc 26 the patch passed
+1 findbugs 60 the patch passed
_ Other Tests _
+1 unit 81 hadoop-aws in the patch passed.
+1 asflicense 31 The patch does not generate ASF License warnings.
3165
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/24/artifact/out/Dockerfile
GITHUB PR #1208
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux c303bc1a8e0e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 5a381f7
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/24/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/24/testReport/
Max. process+thread count 413 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1208/24/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

ok, full test with ddb non-auth happy; repeating with auth

Tomorrow I'll build the CLI and run the various manual operations which were failing

@steveloughran
Copy link
Contributor

+1

As well as all the automated tests, did some manual command line operations.

  • empty args
  • command without -check
  • -check without path
  • against store marked as auth but with incomplete MS
  • after doing an import, same store
  • empty store
  • unguarded store

All outcomes were as expected

I'm happy with this

Followup

One of the changes with the HADOOP-16430 PR is that we now have an S3A FS method boolean allowAuthoritative(final Path path) that takes a path and returns true iff its authoritative either by the MS being auth or the given path being marked as one of the authoritative dirs. I think the validation when an authoritative directory is consistent between the metastore and S3 should be using this when it wants to highlight an authoritative path is inconsistent.

This can be a follow-on patch, because as usual it will need more tests, in the code, and someone to try out the command line.

@bgaborg
Copy link
Author

bgaborg commented Sep 12, 2019

Created followup: https://issues.apache.org/jira/browse/HADOOP-16563
Committing.

@bgaborg bgaborg merged commit 4e273a3 into apache:trunk Sep 12, 2019
smengcl pushed a commit to smengcl/hadoop that referenced this pull request Oct 8, 2019
… metadatastore (log) (apache#1208). Contributed by Gabor Bota.

Change-Id: I6bbb331b6c0a41c61043e482b95504fda8a50596
(cherry picked from commit 4e273a3)
amahussein pushed a commit to amahussein/hadoop that referenced this pull request Oct 29, 2019
… metadatastore (log) (apache#1208). Contributed by Gabor Bota.

Change-Id: I6bbb331b6c0a41c61043e482b95504fda8a50596
RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020
… metadatastore (log) (apache#1208). Contributed by Gabor Bota.

Change-Id: I6bbb331b6c0a41c61043e482b95504fda8a50596
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants