Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-6709 rsvc: xfer only attr size and fix tests #4745

Merged
merged 7 commits into from
Mar 23, 2021
Merged

Conversation

bfaccini
Copy link
Contributor

@bfaccini bfaccini commented Feb 23, 2021

Before this patch, when fetching pool/container attributes,
if a segment buffer size is greater than attribute's real size,
extra data is also leaked.
This patch fixes this, by sending each per-attr segment as a
single bulk transfer to avoid Mercury inter-segment
management and buffer packing.
Doing so preserves original/expected behaviour with no
more extra data leak, so associated tests do not need
to be modified.

Change-Id: I989c384fdc667d12d8aeba3f67e51bb8be6ab290
Signed-off-by: Bruno Faccini bruno.faccini@intel.com

Before this patch, when fetching pool/container attributes,
if a segment buffer size is greater than attribute's real size,
extra data is also leaked.
This patch fixes this, and also modifies daos_test pool/container
attributes related sub-tests to accomodate with the fact that
when multiple attributes are requested, bulk transfer will cause
their values to be packed within the different segments buffers.

Change-Id: I989c384fdc667d12d8aeba3f67e51bb8be6ab290
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@bfaccini bfaccini self-assigned this Feb 24, 2021
Change-Id: I96e3c97c2361ab39c6ef6a42a7e1c693de1eb89e
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/4/execution/node/1529/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1367/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1406/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1455/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/5/execution/node/1542/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/6/execution/node/521/log

@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/6/execution/node/550/log

To avoid functional testing failures in CI, due to a
"monkey" patch no longer needed, as per Brian's inputs.

Change-Id: I96f37e9cba5b0d02dee3e406a4cb51f59509419f
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

To allow for llnl_mpi4py and romio tests to pass again ...

Change-Id: Ic7f28cf6099cb983f369d1c17e1a1783a46bf8fb
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/8/execution/node/1141/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/9/execution/node/1141/log

To integrate PR-4804/DAOS-6891 and fix json error reports
parsing issues within CI ...

Change-Id: I6b05a0ef4d13e9d718a7aa072b15739b09b69543
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4745/10/execution/node/1150/log

@bfaccini
Copy link
Contributor Author

bfaccini commented Mar 9, 2021

4 tests failures in Functional_Hardware_Large and all unrelated to this PR :-( :

1)Test / Functional_Hardware_Large / 1-./erasurecode/ec_ior_smoke.py:ErasureCodeIor.test_ec;container-hosts-client_processes-iorflags-objectclass-1M_128M-pool-0-1-325e – FTEST_erasurecode.ErasureCodeIor
—>> DAOS-6790/DAOS-6975

2)Test / Functional_Hardware_Large / 1-./pool/create_capacity_test.py:PoolCreateTests.test_create_pool_quantity;hosts-pool-0-timeouts-6018 – FTEST_pool.PoolCreateTests
—> DAOS-6976

3)Test / Functional_Hardware_Large / 1-./pool/create_test.py:PoolCreateTests.test_create_no_space;hosts-pool-0-timeouts-474e – FTEST_pool.PoolCreateTests
—>> DAOS-6932

4)Test / Functional_Hardware_Large / 2-./pool/create_test.py:PoolCreateTests.test_create_no_space_loop;hosts-pool-0-timeouts-474e – FTEST_pool.PoolCreateTests
—>> DAOS-6932

To integrate changes from DAOS-6932/PR-4858, and may be more ?,
to fix unrelated CI failures ....

Change-Id: I64de610290dfad1aea7ce2c70538802f3e52d7dc
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
@bfaccini bfaccini marked this pull request as ready for review March 9, 2021 08:40
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@bfaccini bfaccini requested a review from jolivier23 March 9, 2021 08:44
@bfaccini bfaccini requested review from johannlombardi and a team March 11, 2021 08:54
@bfaccini
Copy link
Contributor Author

Just would to raise to the reviewers attention what was already specified in the commit message, but this PR-4745 is a new attempt (after PR-4623 which had finally to be reverted due to regressions found by pool/container attribute-related subtests during "daos_test [-p,-c]") to fix issue and make "daos_test [-p,-c]" aware of new behaviour (only real attribute's value size is xfered with no extra data to fill bigger buffer sized, meaning that bulk xfer can pack more attributes value within same segment buffer if possible, and also that exact xfered size for each attributes is returned instead of exact attributes size).
This may break current definition of daos_cont_get_attr() behaviour, but I have not been able to find it clearly described in the documentation...

@johannlombardi johannlombardi requested a review from liw March 11, 2021 10:12
@johannlombardi
Copy link
Contributor

@liw I would appreciate your feedback on this change.

/* only attr size is requested */
if (sizes[i] == 0)
sizes[i] = iovs[j].iov_len;
}
Copy link
Contributor

@liuxuezhao liuxuezhao Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you only want to transfer the real size for each attribute, but the handling here is incorrect.

  1. I think you need to change attr_bulk_transfer() - start one crt_bulk_transfer() for each attribute's actual size, and set correct offset/length for each attribute's value's bulk transfer.
  2. the sizes setting and bulk_size setting above looks not correct to me.

You may refer obj_bulk_transfer()'s code, just FYI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, Bruno's change looks right to me. The problem is that the original author fell into the crt pitfall that crt looks at iov_buf_len instead of iov_len when creating a bulk handle, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in client's attr_bulk_create(), the bulk handle is created by -
2267 > /* values */
2268 > if (sizes != NULL && values != NULL) {
2269 > > for (j = 0; j < n; ++j)
2270 > > > if (sizes[j] > 0)
2271 > > > > d_iov_set(&sgl.sg_iovs[i++],
2272 > > > > > values[j], sizes[j]);
2273 > }

i.e. for the attr values, each bulk segment is set to user provided buffer list.

Then at server-side, need to start multi-bulk-transferring to fill to each attr's value's segment, if the real size of some middle segments is less than user provided buffer.
For example, client read 2 attrs, each provide 1KB buffer. But actually at server side, it finds that each attr only with value of 0.5KB. Then should start 2 bulk transfer, 1st transfer to first segment's 0-0.5KB, 2nd transfer to 2nd segment's 0-0.5KB. This is what this patch want to do, no?
or I may mis-understood it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I mistakenly assumed that the client-side buffer is one continuous segment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @liuxuezhao and @liw for these very interesting and helpful comments. I think I had to have a look to the similar behaviour in obj_bulk_transfer() before !!... And also it seems to me that attr_bulk_transfer() has been designed to work with a single/common buffer at some point of time...
Will try to push a new commit soon, based on your comments, many thanks again !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are planning to update this PR @bfaccini ? I noticed that @liw approved it after that comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps it's just github being weird. It shows this comment block and afterward, the approval but the approval appears tied to the prior comment from @liw, not the most recent one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are planning to update this PR @bfaccini ?

@jolivier23 , yes I will try to implement the fix for DAOS-6709 based on these comments.

I think @liw had first approved and then felt ok with @liuxuezhao about the way that should be used to fix...

liw
liw previously approved these changes Mar 12, 2021
/* only attr size is requested */
if (sizes[i] == 0)
sizes[i] = iovs[j].iov_len;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, Bruno's change looks right to me. The problem is that the original author fell into the crt pitfall that crt looks at iov_buf_len instead of iov_len when creating a bulk handle, no?

Copy link
Contributor

@jolivier23 jolivier23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just requesting changes to make sure I don't land this til resolved.

This is a new way to fix extra data leak problem if
destination buffer is larger than real attr size.

To allow for only requested or attr exact size to be sent and
avoid packing by Mercury, each segment has to be sent as a
single bulk transfer instead as all being part of the same
bulk transfer and have Mercury do own management.

This allows to conform with original behaviour, so
no associated tests need to be modified anymore.

Change-Id: I02ea7227712a424fb8dd6b382c696fd1d4250db8
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
daos_size_t size;

size = min(sgl.sg_iovs[i].iov_len,
sgl.sg_iovs[i].iov_buf_len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sgl.sg_iovs[i].iov_buf_len);
sgl.sg_iovs[i].iov_buf_len);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you need to refresh, fix this one. Not a huge issue but indent does look a bit off.

@daosbuild1
Copy link
Collaborator

daos_size_t size;

size = min(sgl.sg_iovs[i].iov_len,
sgl.sg_iovs[i].iov_buf_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you need to refresh, fix this one. Not a huge issue but indent does look a bit off.

@bfaccini bfaccini requested a review from a team March 22, 2021 12:47
@jolivier23 jolivier23 dismissed daosbuild1’s stale review March 23, 2021 18:06

minor indentation

@jolivier23 jolivier23 merged commit 644779e into master Mar 23, 2021
@jolivier23 jolivier23 deleted the DAOS-6709_2nd branch March 23, 2021 18:07
bfaccini added a commit that referenced this pull request Mar 24, 2021
This is a new way to fix extra data leak problem if
destination buffer is larger than real attr size.

To allow for only requested or attr exact size to be sent and
avoid packing by Mercury, each segment has to be sent as a
single bulk transfer instead as all being part of the same
bulk transfer and have Mercury do own management.

This allows to conform with original behaviour, so
no associated tests need to be modified anymore.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
jolivier23 pushed a commit that referenced this pull request Mar 29, 2021
This is a new way to fix extra data leak problem if
destination buffer is larger than real attr size.

To allow for only requested or attr exact size to be sent and
avoid packing by Mercury, each segment has to be sent as a
single bulk transfer instead as all being part of the same
bulk transfer and have Mercury do own management.

This allows to conform with original behaviour, so
no associated tests need to be modified anymore.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
@ashleypittman ashleypittman mentioned this pull request Apr 28, 2021
@ashleypittman ashleypittman mentioned this pull request May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants