-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16930 pool: Share map bulk resources #15763
Conversation
Ticket title is 'Pool query fail on some pool with error "DER_NOMEM(-1009): Out of memory"' |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15763/1/execution/node/370/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15763/1/execution/node/301/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15763/1/execution/node/305/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15763/1/execution/node/261/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15763/1/execution/node/345/log |
8b8e8fb
to
41f89fd
Compare
src/pool/srv_target.c
Outdated
{ | ||
D_ASSERT(dss_get_module_info()->dmi_xs_id == 0); | ||
|
||
/* We could cache this longer, actually. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I don't quite see why we have to invalidate the bulk cache here? I think we just need to invalid it when changing pool map, no? (the change you made oin ds_pool_tgt_map_update()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not quite understand why map_bc_put() 2 times below if (pool->sp_map_bc == map_bc)? then the cache is invalidated even no pool map change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's problem. Let me update...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic looks correct although. current code looks just don't want to ping the buf if no active query in long time.
|
||
D_ASSERT(dss_get_module_info()->dmi_xs_id == 0); | ||
|
||
/* For accessing pool->sp_map, but not really necessary. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, on XS0 and no yield, the lock seems not must to take. but looks fine.
src/pool/srv_target.c
Outdated
{ | ||
D_ASSERT(dss_get_module_info()->dmi_xs_id == 0); | ||
|
||
/* We could cache this longer, actually. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not quite understand why map_bc_put() 2 times below if (pool->sp_map_bc == map_bc)? then the cache is invalidated even no pool map change right?
src/pool/srv_target.c
Outdated
{ | ||
D_ASSERT(dss_get_module_info()->dmi_xs_id == 0); | ||
|
||
/* We could cache this longer, actually. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic looks correct although. current code looks just don't want to ping the buf if no active query in long time.
41f89fd
to
43e0391
Compare
Improve concurrent POOL_QUERY, POOL_CONNECT, and POOL_TGT_QUERY_MAP efficiency by giving them a chance to share the same pool map buffer and pool map buffer bulk handle. Signed-off-by: Li Wei <liwei@hpe.com> Required-githooks: true
43e0391
to
2491a37
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15763/4/testReport/ |
Introduce pool space query on service leader to avoid space query flooding. The pool space cache expiration time is 2 seconds by default, one can change the expiration time via DAOS_POOL_SPACE_CACHE_INTVL, if the expiration time is set to zero, space cache will be disabled. Signed-off-by: Niu Yawei <yawei.niu@intel.com>
|
||
static void | ||
map_bc_put(struct ds_pool_map_bc *map_bc) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] D_ASSERT(map_bc->pic_ref > 0)
if (pool->sp_map_bc == NULL) { | ||
int rc; | ||
|
||
rc = map_bc_create(ctx, pool->sp_map, &pool->sp_map_bc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is read lock OK in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the code only execute on XS0 and no yield, looks need not lock
Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15763/5/testReport/ |
Serialize pool space query when space cache is enabled. Change CI global config to disable pool space cache since some tests may need to verify instant pool space changes. Signed-off-by: Niu Yawei <yawei.niu@hpe.com> Co-authored-by: Xuezhao Liu <xuezhao.liu@hpe.com> Co-authored-by: Liang Zhen <liang.zhen@hpe.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylint needs to be resolved
Previous CI run was good: I pushed a simple fix for the lint with most CI skipped |
Improve concurrent POOL_QUERY, POOL_CONNECT, and POOL_TGT_QUERY_MAP efficiency by giving them a chance to share the same pool map buffer and pool map buffer bulk handle.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: