-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4448. Duplicate refreshPipeline in listStatus #1569
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, +1.
@adoroszlai , I notice one thing that if we can completely remove field OmKeyArgs#refreshPipeline after HDDS-3658 was merged.
In some place, this option doesn't really make sense now.
Like KeyManagerImpl#getOzoneFileStatus:
private OzoneFileStatus getOzoneFileStatus(String volumeName,
String bucketName,
String keyName,
boolean refreshPipeline,
boolean sortDatanodes,
String clientAddress)
throws IOException {
OmKeyInfo fileKeyInfo = null;
metadataManager.getLock().acquireReadLock(BUCKET_LOCK, volumeName,
bucketName);
...
// if the key is a file then do refresh pipeline info in OM by asking SCM
if (fileKeyInfo != null) {
// refreshPipeline flag check has been removed as part of
// https://issues.apache.org/jira/browse/HDDS-3658.
// Please refer this jira for more details.
refreshPipeline(fileKeyInfo);
if (sortDatanodes) {
sortDatanodeInPipeline(fileKeyInfo, clientAddress);
}
return new OzoneFileStatus(fileKeyInfo, scmBlockSize, false);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I have few comments/questions.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
Show resolved
Hide resolved
if (args.getSortDatanodes()) { | ||
keyInfoList.add(fileStatus.getKeyInfo()); | ||
} | ||
refreshPipeline(keyInfoList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question listStatus is limited by batch size.
But we are making a single Rpc Call to SCM for all the containers we got from this list, is this fine here? (My question is if it has a large number of containerIDs due to larger key sizes, will it have an issue in Rpc Response size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listKeys()
is also limited by batch size and already does the same single refresh call:
ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
Lines 931 to 933 in d0aa34c
List<OmKeyInfo> keyList = metadataManager.listKeys(volumeName, bucketName, | |
startKey, keyPrefix, maxKeys); | |
refreshPipeline(keyList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is keyList/listStatus is limited by batch size, but if Keys are huge in size, then it might have a long list of container IDS, and then calling SCM with all those containerIDS to fetch ContainerWithPipeline, will it cause an issue of exceeding Rpc response size.
listKeys() is also limited by batch size and already does the same single refresh call:
Then this might be a problem for listKeys also, but do you see this as a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this might be a problem for listKeys also, but do you see this as a problem?
I don't see this as an immediate problem.
As far as I understand, batch size for listKeys
and listStatus
is a convenience for the client, not a safety guarantee for the server. If RPC response size is a problem when performing getContainerWithPipelineBatch
call for multiple keys, then the same can be triggered by the client simply by increasing batch size (if there are enough keys in the bucket).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya agreed, not only that thinking more if with normal batch size it is causing issue for containerID's, then it will be same for OMResponse for listKeys also, as it adds KeyInfo and Pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
Thanks @linyiqun for the review, and @bharatviswa504 for review and merge. |
* HDDS-3698-upgrade: (46 commits) HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595) HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581) HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597) HDDS-4432. Update Ratis version to latest snapshot. (apache#1586) HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605) HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598) HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576) HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549) HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569) HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572) HDDS-4346.Ozone specific Trash Policy (apache#1535) HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561) HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526) HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573) HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589) HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591) HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590) HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594) HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) ...
* HDDS-3698-upgrade: (47 commits) HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595) HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581) HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597) HDDS-4432. Update Ratis version to latest snapshot. (apache#1586) HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605) HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598) HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576) HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549) HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569) HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572) HDDS-4346.Ozone specific Trash Policy (apache#1535) HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561) HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526) HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573) HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589) HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591) HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590) HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594) HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) ...
What changes were proposed in this pull request?
Currently
KeyManagerImpl#listStatus
issues duplicaterefreshPipeline
for each file. HDDS-3824 movedrefreshPipeline
outside the bucket lock. But HDDS-3658 added it back, while keeping the one outside the lock, probably as a result of merge conflict resolution.This PR removes the
refreshPipeline
call which is made while holding the bucket lock. It also converts the remaining one to batched (single call with list, instead of N times with single OM key info).https://issues.apache.org/jira/browse/HDDS-4448
How was this patch tested?
Added unit test to verify that a single batched
refreshPipeline
call is made fromlistStatus
.https://github.com/adoroszlai/hadoop-ozone/runs/1381172579