-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-6999 object: add shard id to daos_shard_tgt #4959
Conversation
Signed-off-by: Fan Yong <fan.yong@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In dc_tx_classify_common(), "dcri->dcri_shard_idx = shard->do_shard;" need to be fixed also.
Add shard id to daos_shard_tgt, since st_shard is the shard index, which is not the id_shard for reintegrating/extending layout. Signed-off-by: Di Wang <di.wang@intel.com>
Because shard index and identifier may be different for the object with reintegrating/extending layout, we need to properly use them for compounded RPC. Signed-off-by: Fan Yong <fan.yong@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
oh, do_shard coming from po_shard, which is correct actually. |
ah, I will fix the rpc version. |
"po_shard" is the shard index, according to your patch logic, the shard index is not equal to the id_shard for reintegrating/extending layout. |
No, po_shard is the shard id. not offset inside the layout. See this layout dump of extending object. (shard_id) 03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:255 obj_layout_dump() dump layout for 1152922711492657175.248, ver 4 |
Then daos_cpd_req_idx() needs to be fixed, both the object index and the object ID are required for CPD RPC. Can you fix that in this patch or you want me to fix it in another patch based on your patch. Such fix will change CPD RPC protocol. Personally, I prefer to fix it in this patch, that will avoid changing the RPC version again. |
Please fix it in another patch, since I am not sure I understand that part of code good enough. BTW: probably do not need change the RPC version, if your fix will get into 1.2 as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-4959/5/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules
src/object/obj_rpc.h
Outdated
@@ -467,13 +467,17 @@ struct daos_cpd_sub_req { | |||
*/ | |||
struct daos_cpd_req_idx { | |||
/* Shard index of the object for the sub request on this DAOS target. */ | |||
uint32_t dcri_shard_idx; | |||
uint32_t dcri_shard_off; | |||
/* Shard identifier of the object for the sub request on this DAOS target. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) line over 80 characters
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4959/5/execution/node/72/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
@@ -942,6 +942,7 @@ obj_shard_tgts_query(struct dc_object *obj, uint32_t map_ver, uint32_t shard, | |||
|
|||
shard_tgt->st_rank = obj_shard->do_target_rank; | |||
shard_tgt->st_shard = shard; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, did not check carefully last time, seems original st_shard is already the shard_id?
assume the different is that they possibly different when grp_nr > 1. For example for an obj_class with grp_nr = 2, grp_size = 3.
Then "shard index" can be [0, 2], but "shard id" can be [0, 5] right?
So seems the original st_shard is "shard id" already?
and in obj_shard_open() "oid.id_shard = obj_shard->do_shard;" seems incorrect, because obj_shard->do_shard is "shard index" but need to be "shard id".
It really confuse to with both the two things, it would be great to only have one (the "shard id"), do we really need the "shard index"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assume the different is that they possibly different when grp_nr > 1. For example for an obj_class with grp_nr = 2, grp_size = 3.
Then "shard index" can be [0, 2], but "shard id" can be [0, 5] right?
So seems the original st_shard is "shard id" already?
oh, this is for shard extending case (during reintegration/extending). Usually, shard_id is the offset of the shard in layout. (not within the group). But during reintegration, some extra shards (old stale shards) might be added to the layout, then this is not true anymore, so we have to separate them as st_shard(offset) and st_shard_id(real id). Otherwise server forward will use wrong shard to forward, thus causing corruption.
hmm, do_shard is the shard_id, which comes from po_shard. or I miss sth?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"do_shard is the shard_id, which comes from po_shard" you are right, thanks.
*Add shard id to daos_shard_tgt, since st_shard is the shard index, which is not the id_shard for reintegrating/extending layout. *Because shard index and identifier may be different for the object with reintegrating/extending layout, we need to properly use them for compounded RPC. Signed-off-by: Di Wang <di.wang@intel.com> Co-authored-by: Fan Yong <fan.yong@intel.com>
Add shard id to daos_shard_tgt, since st_shard is
the shard index, which is not the id_shard for
reintegrating/extending layout.
Signed-off-by: Di Wang di.wang@intel.com