HDDS-4448. Duplicate refreshPipeline in listStatus #1569

adoroszlai · 2020-11-10T19:40:04Z

What changes were proposed in this pull request?

Currently KeyManagerImpl#listStatus issues duplicate refreshPipeline for each file. HDDS-3824 moved refreshPipeline outside the bucket lock. But HDDS-3658 added it back, while keeping the one outside the lock, probably as a result of merge conflict resolution.

This PR removes the refreshPipeline call which is made while holding the bucket lock. It also converts the remaining one to batched (single call with list, instead of N times with single OM key info).

https://issues.apache.org/jira/browse/HDDS-4448

How was this patch tested?

Added unit test to verify that a single batched refreshPipeline call is made from listStatus.

https://github.com/adoroszlai/hadoop-ozone/runs/1381172579

linyiqun

Good catch, +1.
@adoroszlai , I notice one thing that if we can completely remove field OmKeyArgs#refreshPipeline after HDDS-3658 was merged.

In some place, this option doesn't really make sense now.
Like KeyManagerImpl#getOzoneFileStatus:

  private OzoneFileStatus getOzoneFileStatus(String volumeName,
                                             String bucketName,
                                             String keyName,
                                             boolean refreshPipeline,
                                             boolean sortDatanodes,
                                             String clientAddress)
      throws IOException {
    OmKeyInfo fileKeyInfo = null;
    metadataManager.getLock().acquireReadLock(BUCKET_LOCK, volumeName,
        bucketName);
    ...

      // if the key is a file then do refresh pipeline info in OM by asking SCM
      if (fileKeyInfo != null) {
        // refreshPipeline flag check has been removed as part of
        // https://issues.apache.org/jira/browse/HDDS-3658.
        // Please refer this jira for more details.
        refreshPipeline(fileKeyInfo);
        if (sortDatanodes) {
          sortDatanodeInPipeline(fileKeyInfo, clientAddress);
        }
        return new OzoneFileStatus(fileKeyInfo, scmBlockSize, false);
      }
    }

bharatviswa504

Overall LGTM, I have few comments/questions.

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java

bharatviswa504 · 2020-11-16T23:33:47Z

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java

-      if (args.getSortDatanodes()) {
+      keyInfoList.add(fileStatus.getKeyInfo());
+    }
+    refreshPipeline(keyInfoList);


Just a question listStatus is limited by batch size.
But we are making a single Rpc Call to SCM for all the containers we got from this list, is this fine here? (My question is if it has a large number of containerIDs due to larger key sizes, will it have an issue in Rpc Response size.

listKeys() is also limited by batch size and already does the same single refresh call:

ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java

Lines 931 to 933 in d0aa34c

List<OmKeyInfo> keyList = metadataManager.listKeys(volumeName, bucketName,

startKey, keyPrefix, maxKeys);

refreshPipeline(keyList);

My question is keyList/listStatus is limited by batch size, but if Keys are huge in size, then it might have a long list of container IDS, and then calling SCM with all those containerIDS to fetch ContainerWithPipeline, will it cause an issue of exceeding Rpc response size.

listKeys() is also limited by batch size and already does the same single refresh call:

Then this might be a problem for listKeys also, but do you see this as a problem?

Then this might be a problem for listKeys also, but do you see this as a problem?

I don't see this as an immediate problem.

As far as I understand, batch size for listKeys and listStatus is a convenience for the client, not a safety guarantee for the server. If RPC response size is a problem when performing getContainerWithPipelineBatch call for multiple keys, then the same can be triggered by the client simply by increasing batch size (if there are enough keys in the bucket).

Ya agreed, not only that thinking more if with normal batch size it is causing issue for containerID's, then it will be same for OMResponse for listKeys also, as it adds KeyInfo and Pipeline.

bharatviswa504

+1 LGTM

adoroszlai · 2020-11-19T08:29:11Z

Thanks @linyiqun for the review, and @bharatviswa504 for review and merge.

* HDDS-3698-upgrade: (46 commits) HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595) HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581) HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597) HDDS-4432. Update Ratis version to latest snapshot. (apache#1586) HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605) HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598) HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576) HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549) HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569) HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572) HDDS-4346.Ozone specific Trash Policy (apache#1535) HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561) HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526) HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573) HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589) HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591) HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590) HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594) HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) ...

* HDDS-3698-upgrade: (47 commits) HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595) HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581) HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597) HDDS-4432. Update Ratis version to latest snapshot. (apache#1586) HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605) HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598) HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576) HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549) HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569) HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572) HDDS-4346.Ozone specific Trash Policy (apache#1535) HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561) HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526) HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573) HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589) HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591) HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590) HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594) HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) ...

adoroszlai added 2 commits November 10, 2020 17:00

HDDS-4448. Duplicate refreshPipeline in listStatus

a1fdbe5

Always refresh pipeline

7a50000

adoroszlai self-assigned this Nov 10, 2020

adoroszlai requested review from rakeshadr, ChenSammi and jojochuang November 10, 2020 19:40

trigger new CI check

3911ce7

linyiqun approved these changes Nov 11, 2020

View reviewed changes

adoroszlai added the om label Nov 13, 2020

bharatviswa504 reviewed Nov 16, 2020

View reviewed changes

bharatviswa504 approved these changes Nov 19, 2020

View reviewed changes

bharatviswa504 merged commit bbeaf65 into apache:master Nov 19, 2020

adoroszlai deleted the HDDS-4448 branch November 19, 2020 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-4448. Duplicate refreshPipeline in listStatus #1569

HDDS-4448. Duplicate refreshPipeline in listStatus #1569

adoroszlai commented Nov 10, 2020 •

edited

Loading

linyiqun left a comment •

edited

Loading

bharatviswa504 left a comment

bharatviswa504 Nov 16, 2020

adoroszlai Nov 17, 2020

bharatviswa504 Nov 17, 2020 •

edited

Loading

adoroszlai Nov 18, 2020 •

edited

Loading

bharatviswa504 Nov 19, 2020 •

edited

Loading

bharatviswa504 left a comment

adoroszlai commented Nov 19, 2020

	List<OmKeyInfo> keyList = metadataManager.listKeys(volumeName, bucketName,
	startKey, keyPrefix, maxKeys);
	refreshPipeline(keyList);

HDDS-4448. Duplicate refreshPipeline in listStatus #1569

HDDS-4448. Duplicate refreshPipeline in listStatus #1569

Conversation

adoroszlai commented Nov 10, 2020 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

linyiqun left a comment • edited Loading

Choose a reason for hiding this comment

bharatviswa504 left a comment

Choose a reason for hiding this comment

bharatviswa504 Nov 16, 2020

Choose a reason for hiding this comment

adoroszlai Nov 17, 2020

Choose a reason for hiding this comment

bharatviswa504 Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

adoroszlai Nov 18, 2020 • edited Loading

Choose a reason for hiding this comment

bharatviswa504 Nov 19, 2020 • edited Loading

Choose a reason for hiding this comment

bharatviswa504 left a comment

Choose a reason for hiding this comment

adoroszlai commented Nov 19, 2020

adoroszlai commented Nov 10, 2020 •

edited

Loading

linyiqun left a comment •

edited

Loading

bharatviswa504 Nov 17, 2020 •

edited

Loading

adoroszlai Nov 18, 2020 •

edited

Loading

bharatviswa504 Nov 19, 2020 •

edited

Loading