-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-6709 rsvc: xfer only attr size and fix tests #4745
Conversation
Before this patch, when fetching pool/container attributes, if a segment buffer size is greater than attribute's real size, extra data is also leaked. This patch fixes this, and also modifies daos_test pool/container attributes related sub-tests to accomodate with the fact that when multiple attributes are requested, bulk transfer will cause their values to be packed within the different segments buffers. Change-Id: I989c384fdc667d12d8aeba3f67e51bb8be6ab290 Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Change-Id: I96e3c97c2361ab39c6ef6a42a7e1c693de1eb89e Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/4/execution/node/1529/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1367/log |
Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1406/log |
Test stage Functional on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1455/log |
Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1542/log |
Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/6/execution/node/521/log |
Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/6/execution/node/550/log |
To avoid functional testing failures in CI, due to a "monkey" patch no longer needed, as per Brian's inputs. Change-Id: I96f37e9cba5b0d02dee3e406a4cb51f59509419f Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
To allow for llnl_mpi4py and romio tests to pass again ... Change-Id: Ic7f28cf6099cb983f369d1c17e1a1783a46bf8fb Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/8/execution/node/1141/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/9/execution/node/1141/log |
To integrate PR-4804/DAOS-6891 and fix json error reports parsing issues within CI ... Change-Id: I6b05a0ef4d13e9d718a7aa072b15739b09b69543 Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/10/execution/node/1150/log |
4 tests failures in Functional_Hardware_Large and all unrelated to this PR :-( : 1)Test / Functional_Hardware_Large / 1-./erasurecode/ec_ior_smoke.py:ErasureCodeIor.test_ec;container-hosts-client_processes-iorflags-objectclass-1M_128M-pool-0-1-325e – FTEST_erasurecode.ErasureCodeIor 2)Test / Functional_Hardware_Large / 1-./pool/create_capacity_test.py:PoolCreateTests.test_create_pool_quantity;hosts-pool-0-timeouts-6018 – FTEST_pool.PoolCreateTests 3)Test / Functional_Hardware_Large / 1-./pool/create_test.py:PoolCreateTests.test_create_no_space;hosts-pool-0-timeouts-474e – FTEST_pool.PoolCreateTests 4)Test / Functional_Hardware_Large / 2-./pool/create_test.py:PoolCreateTests.test_create_no_space_loop;hosts-pool-0-timeouts-474e – FTEST_pool.PoolCreateTests |
To integrate changes from DAOS-6932/PR-4858, and may be more ?, to fix unrelated CI failures .... Change-Id: I64de610290dfad1aea7ce2c70538802f3e52d7dc Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Just would to raise to the reviewers attention what was already specified in the commit message, but this PR-4745 is a new attempt (after PR-4623 which had finally to be reverted due to regressions found by pool/container attribute-related subtests during "daos_test [-p,-c]") to fix issue and make "daos_test [-p,-c]" aware of new behaviour (only real attribute's value size is xfered with no extra data to fill bigger buffer sized, meaning that bulk xfer can pack more attributes value within same segment buffer if possible, and also that exact xfered size for each attributes is returned instead of exact attributes size). |
@liw I would appreciate your feedback on this change. |
src/rsvc/srv_common.c
Outdated
/* only attr size is requested */ | ||
if (sizes[i] == 0) | ||
sizes[i] = iovs[j].iov_len; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you only want to transfer the real size for each attribute, but the handling here is incorrect.
- I think you need to change attr_bulk_transfer() - start one crt_bulk_transfer() for each attribute's actual size, and set correct offset/length for each attribute's value's bulk transfer.
- the sizes setting and bulk_size setting above looks not correct to me.
You may refer obj_bulk_transfer()'s code, just FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, Bruno's change looks right to me. The problem is that the original author fell into the crt pitfall that crt looks at iov_buf_len instead of iov_len when creating a bulk handle, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in client's attr_bulk_create(), the bulk handle is created by -
2267 > /* values */
2268 > if (sizes != NULL && values != NULL) {
2269 > > for (j = 0; j < n; ++j)
2270 > > > if (sizes[j] > 0)
2271 > > > > d_iov_set(&sgl.sg_iovs[i++],
2272 > > > > > values[j], sizes[j]);
2273 > }
i.e. for the attr values, each bulk segment is set to user provided buffer list.
Then at server-side, need to start multi-bulk-transferring to fill to each attr's value's segment, if the real size of some middle segments is less than user provided buffer.
For example, client read 2 attrs, each provide 1KB buffer. But actually at server side, it finds that each attr only with value of 0.5KB. Then should start 2 bulk transfer, 1st transfer to first segment's 0-0.5KB, 2nd transfer to 2nd segment's 0-0.5KB. This is what this patch want to do, no?
or I may mis-understood it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I mistakenly assumed that the client-side buffer is one continuous segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @liuxuezhao and @liw for these very interesting and helpful comments. I think I had to have a look to the similar behaviour in obj_bulk_transfer() before !!... And also it seems to me that attr_bulk_transfer() has been designed to work with a single/common buffer at some point of time...
Will try to push a new commit soon, based on your comments, many thanks again !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps it's just github being weird. It shows this comment block and afterward, the approval but the approval appears tied to the prior comment from @liw, not the most recent one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are planning to update this PR @bfaccini ?
@jolivier23 , yes I will try to implement the fix for DAOS-6709 based on these comments.
I think @liw had first approved and then felt ok with @liuxuezhao about the way that should be used to fix...
src/rsvc/srv_common.c
Outdated
/* only attr size is requested */ | ||
if (sizes[i] == 0) | ||
sizes[i] = iovs[j].iov_len; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, Bruno's change looks right to me. The problem is that the original author fell into the crt pitfall that crt looks at iov_buf_len instead of iov_len when creating a bulk handle, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just requesting changes to make sure I don't land this til resolved.
This is a new way to fix extra data leak problem if destination buffer is larger than real attr size. To allow for only requested or attr exact size to be sent and avoid packing by Mercury, each segment has to be sent as a single bulk transfer instead as all being part of the same bulk transfer and have Mercury do own management. This allows to conform with original behaviour, so no associated tests need to be modified anymore. Change-Id: I02ea7227712a424fb8dd6b382c696fd1d4250db8 Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
daos_size_t size; | ||
|
||
size = min(sgl.sg_iovs[i].iov_len, | ||
sgl.sg_iovs[i].iov_buf_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgl.sg_iovs[i].iov_buf_len); | |
sgl.sg_iovs[i].iov_buf_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you need to refresh, fix this one. Not a huge issue but indent does look a bit off.
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/12/execution/node/72/log |
daos_size_t size; | ||
|
||
size = min(sgl.sg_iovs[i].iov_len, | ||
sgl.sg_iovs[i].iov_buf_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you need to refresh, fix this one. Not a huge issue but indent does look a bit off.
This is a new way to fix extra data leak problem if destination buffer is larger than real attr size. To allow for only requested or attr exact size to be sent and avoid packing by Mercury, each segment has to be sent as a single bulk transfer instead as all being part of the same bulk transfer and have Mercury do own management. This allows to conform with original behaviour, so no associated tests need to be modified anymore. Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
This is a new way to fix extra data leak problem if destination buffer is larger than real attr size. To allow for only requested or attr exact size to be sent and avoid packing by Mercury, each segment has to be sent as a single bulk transfer instead as all being part of the same bulk transfer and have Mercury do own management. This allows to conform with original behaviour, so no associated tests need to be modified anymore. Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Before this patch, when fetching pool/container attributes,
if a segment buffer size is greater than attribute's real size,
extra data is also leaked.
This patch fixes this, by sending each per-attr segment as a
single bulk transfer to avoid Mercury inter-segment
management and buffer packing.
Doing so preserves original/expected behaviour with no
more extra data leak, so associated tests do not need
to be modified.
Change-Id: I989c384fdc667d12d8aeba3f67e51bb8be6ab290
Signed-off-by: Bruno Faccini bruno.faccini@intel.com