Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-6999 object: add shard id to daos_shard_tgt #4959

Merged
merged 3 commits into from
Mar 17, 2021
Merged

Conversation

wangdi1
Copy link
Contributor

@wangdi1 wangdi1 commented Mar 11, 2021

Add shard id to daos_shard_tgt, since st_shard is
the shard index, which is not the id_shard for
reintegrating/extending layout.

Signed-off-by: Di Wang di.wang@intel.com

Signed-off-by: Fan Yong <fan.yong@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

liuxuezhao
liuxuezhao previously approved these changes Mar 12, 2021
Copy link
Contributor

@Nasf-Fan Nasf-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dc_tx_classify_common(), "dcri->dcri_shard_idx = shard->do_shard;" need to be fixed also.

src/include/daos/object.h Show resolved Hide resolved
Di Wang and others added 2 commits March 12, 2021 06:38
Add shard id to daos_shard_tgt, since st_shard is
the shard index, which is not the id_shard for
reintegrating/extending layout.

Signed-off-by: Di Wang <di.wang@intel.com>
Because shard index and identifier may be different for the object
with reintegrating/extending layout, we need to properly use them
for compounded RPC.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@wangdi1
Copy link
Contributor Author

wangdi1 commented Mar 12, 2021

In dc_tx_classify_common(), "dcri->dcri_shard_idx = shard->do_shard;" need to be fixed also.

oh, do_shard coming from po_shard, which is correct actually.

@wangdi1
Copy link
Contributor Author

wangdi1 commented Mar 12, 2021

ah, I will fix the rpc version.

@wangdi1 wangdi1 requested review from Nasf-Fan and liuxuezhao March 12, 2021 06:43
liuxuezhao
liuxuezhao previously approved these changes Mar 12, 2021
@wangdi1 wangdi1 requested a review from liuxuezhao March 12, 2021 17:18
@Nasf-Fan
Copy link
Contributor

In dc_tx_classify_common(), "dcri->dcri_shard_idx = shard->do_shard;" need to be fixed also.

oh, do_shard coming from po_shard, which is correct actually.

"po_shard" is the shard index, according to your patch logic, the shard index is not equal to the id_shard for reintegrating/extending layout.

@wangdi1
Copy link
Contributor Author

wangdi1 commented Mar 14, 2021

In dc_tx_classify_common(), "dcri->dcri_shard_idx = shard->do_shard;" need to be fixed also.

oh, do_shard coming from po_shard, which is correct actually.

"po_shard" is the shard index, according to your patch logic, the shard index is not equal to the id_shard for reintegrating/extending layout.

No, po_shard is the shard id. not offset inside the layout.

See this layout dump of extending object. (shard_id)

03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:255 obj_layout_dump() dump layout for 1152922711492657175.248, ver 4
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 0: shard_id 0, tgt_id 7, f_seq 2, healthy
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 1: shard_id 1, tgt_id 4, f_seq 1, healthy
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 2: shard_id 2, tgt_id 6, f_seq 1, healthy
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 3: shard_id 0, tgt_id 2, f_seq 2, rebuilding
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 4: shard_id 3, tgt_id 0, f_seq 1, healthy
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 5: shard_id 4, tgt_id 5, f_seq 1, healthy
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 6: shard_id 5, tgt_id 3, f_seq 1, healthy
03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 7: shard_id -1, tgt_id -1, f_seq 0, healthy
03/10-03:38:42.77 boro-38 DAOS[8229/8266] object DBUG src/object/obj_class.c:824 daos_oclass_attr_find() Find class RP_3G2 for oid 1152922711492657175.248

@Nasf-Fan
Copy link
Contributor

Nasf-Fan commented Mar 15, 2021

03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 3: shard_id 0, tgt_id 2, f_seq 2, rebuilding

Then daos_cpd_req_idx() needs to be fixed, both the object index and the object ID are required for CPD RPC. Can you fix that in this patch or you want me to fix it in another patch based on your patch. Such fix will change CPD RPC protocol.

Personally, I prefer to fix it in this patch, that will avoid changing the RPC version again.

@wangdi1
Copy link
Contributor Author

wangdi1 commented Mar 16, 2021

03/10-03:38:42.77 boro-38 DAOS[8229/8266] placement DBUG src/placement/pl_map.c:263 obj_layout_dump() 3: shard_id 0, tgt_id 2, f_seq 2, rebuilding

Then daos_cpd_req_idx() needs to be fixed, both the object index and the object ID are required for CPD RPC. Can you fix that in this patch or you want me to fix it in another patch based on your patch. Such fix will change CPD RPC protocol.

Personally, I prefer to fix it in this patch, that will avoid changing the RPC version again.

Please fix it in another patch, since I am not sure I understand that part of code good enough. BTW: probably do not need change the RPC version, if your fix will get into 1.2 as well.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -467,13 +467,17 @@ struct daos_cpd_sub_req {
*/
struct daos_cpd_req_idx {
/* Shard index of the object for the sub request on this DAOS target. */
uint32_t dcri_shard_idx;
uint32_t dcri_shard_off;
/* Shard identifier of the object for the sub request on this DAOS target. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) line over 80 characters

@daosbuild1
Copy link
Collaborator

@daosbuild1 daosbuild1 dismissed their stale review March 16, 2021 04:07

Updated patch

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@@ -942,6 +942,7 @@ obj_shard_tgts_query(struct dc_object *obj, uint32_t map_ver, uint32_t shard,

shard_tgt->st_rank = obj_shard->do_target_rank;
shard_tgt->st_shard = shard;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, did not check carefully last time, seems original st_shard is already the shard_id?
assume the different is that they possibly different when grp_nr > 1. For example for an obj_class with grp_nr = 2, grp_size = 3.
Then "shard index" can be [0, 2], but "shard id" can be [0, 5] right?
So seems the original st_shard is "shard id" already?

and in obj_shard_open() "oid.id_shard = obj_shard->do_shard;" seems incorrect, because obj_shard->do_shard is "shard index" but need to be "shard id".

It really confuse to with both the two things, it would be great to only have one (the "shard id"), do we really need the "shard index"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume the different is that they possibly different when grp_nr > 1. For example for an obj_class with grp_nr = 2, grp_size = 3.
Then "shard index" can be [0, 2], but "shard id" can be [0, 5] right?
So seems the original st_shard is "shard id" already?

oh, this is for shard extending case (during reintegration/extending). Usually, shard_id is the offset of the shard in layout. (not within the group). But during reintegration, some extra shards (old stale shards) might be added to the layout, then this is not true anymore, so we have to separate them as st_shard(offset) and st_shard_id(real id). Otherwise server forward will use wrong shard to forward, thus causing corruption.

hmm, do_shard is the shard_id, which comes from po_shard. or I miss sth?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do_shard is the shard_id, which comes from po_shard" you are right, thanks.

@jolivier23 jolivier23 merged commit d5c1bc0 into master Mar 17, 2021
@jolivier23 jolivier23 deleted the daos_6999 branch March 17, 2021 18:02
wangdi1 pushed a commit that referenced this pull request Mar 17, 2021
*Add shard id to daos_shard_tgt, since st_shard is
the shard index, which is not the id_shard for
reintegrating/extending layout.
*Because shard index and identifier may be different for the object
with reintegrating/extending layout, we need to properly use them
for compounded RPC.

Signed-off-by: Di Wang <di.wang@intel.com>
Co-authored-by: Fan Yong <fan.yong@intel.com>
@ashleypittman ashleypittman mentioned this pull request Apr 28, 2021
@ashleypittman ashleypittman mentioned this pull request May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants