Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4448. Duplicate refreshPipeline in listStatus #1569

Merged
merged 3 commits into from
Nov 19, 2020

Conversation

adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Nov 10, 2020

What changes were proposed in this pull request?

Currently KeyManagerImpl#listStatus issues duplicate refreshPipeline for each file. HDDS-3824 moved refreshPipeline outside the bucket lock. But HDDS-3658 added it back, while keeping the one outside the lock, probably as a result of merge conflict resolution.

This PR removes the refreshPipeline call which is made while holding the bucket lock. It also converts the remaining one to batched (single call with list, instead of N times with single OM key info).

https://issues.apache.org/jira/browse/HDDS-4448

How was this patch tested?

Added unit test to verify that a single batched refreshPipeline call is made from listStatus.

https://github.com/adoroszlai/hadoop-ozone/runs/1381172579

Copy link
Contributor

@linyiqun linyiqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, +1.
@adoroszlai , I notice one thing that if we can completely remove field OmKeyArgs#refreshPipeline after HDDS-3658 was merged.

In some place, this option doesn't really make sense now.
Like KeyManagerImpl#getOzoneFileStatus:

  private OzoneFileStatus getOzoneFileStatus(String volumeName,
                                             String bucketName,
                                             String keyName,
                                             boolean refreshPipeline,
                                             boolean sortDatanodes,
                                             String clientAddress)
      throws IOException {
    OmKeyInfo fileKeyInfo = null;
    metadataManager.getLock().acquireReadLock(BUCKET_LOCK, volumeName,
        bucketName);
    ...

      // if the key is a file then do refresh pipeline info in OM by asking SCM
      if (fileKeyInfo != null) {
        // refreshPipeline flag check has been removed as part of
        // https://issues.apache.org/jira/browse/HDDS-3658.
        // Please refer this jira for more details.
        refreshPipeline(fileKeyInfo);
        if (sortDatanodes) {
          sortDatanodeInPipeline(fileKeyInfo, clientAddress);
        }
        return new OzoneFileStatus(fileKeyInfo, scmBlockSize, false);
      }
    }

@adoroszlai adoroszlai added the om label Nov 13, 2020
Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I have few comments/questions.

if (args.getSortDatanodes()) {
keyInfoList.add(fileStatus.getKeyInfo());
}
refreshPipeline(keyInfoList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question listStatus is limited by batch size.
But we are making a single Rpc Call to SCM for all the containers we got from this list, is this fine here? (My question is if it has a large number of containerIDs due to larger key sizes, will it have an issue in Rpc Response size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listKeys() is also limited by batch size and already does the same single refresh call:

List<OmKeyInfo> keyList = metadataManager.listKeys(volumeName, bucketName,
startKey, keyPrefix, maxKeys);
refreshPipeline(keyList);

Copy link
Contributor

@bharatviswa504 bharatviswa504 Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is keyList/listStatus is limited by batch size, but if Keys are huge in size, then it might have a long list of container IDS, and then calling SCM with all those containerIDS to fetch ContainerWithPipeline, will it cause an issue of exceeding Rpc response size.

listKeys() is also limited by batch size and already does the same single refresh call:

Then this might be a problem for listKeys also, but do you see this as a problem?

Copy link
Contributor Author

@adoroszlai adoroszlai Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this might be a problem for listKeys also, but do you see this as a problem?

I don't see this as an immediate problem.

As far as I understand, batch size for listKeys and listStatus is a convenience for the client, not a safety guarantee for the server. If RPC response size is a problem when performing getContainerWithPipelineBatch call for multiple keys, then the same can be triggered by the client simply by increasing batch size (if there are enough keys in the bucket).

Copy link
Contributor

@bharatviswa504 bharatviswa504 Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya agreed, not only that thinking more if with normal batch size it is causing issue for containerID's, then it will be same for OMResponse for listKeys also, as it adds KeyInfo and Pipeline.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@bharatviswa504 bharatviswa504 merged commit bbeaf65 into apache:master Nov 19, 2020
@adoroszlai adoroszlai deleted the HDDS-4448 branch November 19, 2020 08:28
@adoroszlai
Copy link
Contributor Author

Thanks @linyiqun for the review, and @bharatviswa504 for review and merge.

errose28 added a commit to errose28/ozone that referenced this pull request Nov 24, 2020
* HDDS-3698-upgrade: (46 commits)
  HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595)
  HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581)
  HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597)
  HDDS-4432. Update Ratis version to latest snapshot. (apache#1586)
  HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605)
  HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598)
  HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576)
  HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549)
  HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569)
  HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572)
  HDDS-4346.Ozone specific Trash Policy (apache#1535)
  HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561)
  HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526)
  HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573)
  HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589)
  HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591)
  HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590)
  HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594)
  HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592)
  HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Nov 25, 2020
* HDDS-3698-upgrade: (47 commits)
  HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595)
  HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581)
  HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597)
  HDDS-4432. Update Ratis version to latest snapshot. (apache#1586)
  HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605)
  HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598)
  HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576)
  HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549)
  HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569)
  HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572)
  HDDS-4346.Ozone specific Trash Policy (apache#1535)
  HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561)
  HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526)
  HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573)
  HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589)
  HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591)
  HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590)
  HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594)
  HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592)
  HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants