Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM copy object encryption fixes -- phase 1 #54543

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mdw-at-linuxbox
Copy link
Contributor

This is a "review only" version of copy object encryption fixes "phase 1".

2 commits: first adds all the code changes in, the second moves some object files around for cmake.

missing: a few loose ends in rgw_op.c for derypt. teuthology tests. documentation?

re-encryption will be "phase 2", following the pattern set here for decrypt.

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍

src/rgw/rgw_op.cc Outdated Show resolved Hide resolved
src/rgw/rgw_op.cc Outdated Show resolved Hide resolved
Comment on lines 5731 to 5968
oproc = std::make_unique<RGWCOE_proc_from_filters>(RGWCOE_proc_from_filters(*filter));
return *oproc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RGWCOE_make_filter_pipeline maintains ownership of these filters long after copy_obj_data() returns and its AtomicObjectProcessor destructs. that leaves us with a dangling reference, which may or may not end up being a problem

can we find a way for get_filter() to transfer their ownership to the caller? if everything lived on copy_obj_data()'s stack, the correct order of destruction would happen naturally

that could be some class CopyFilterStack : public DataProcessor that owns these filters, and whose process() method just forwards to the last filter added. get_filter() would return that as a unique_ptr<DataProcessor>

this,
s->yield);
} catch (int caught_errno) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if exceptions are really necessary, please throw something that derives from std::exception. we tend to use catch (const std::exception&) as a catch-all which depends on that

there's a std::system_error exception that wraps an error code:

try {
  throw std::system_error(-r, std::system_category());
} catch (const std::system_error& e) {
  // log as e.what()
  op_ret = -e.code().value();
}

copy_object() returns int already, so these exceptions probably could have been caught and converted to error codes long before this

/**
* @brief Read filter when copying data from object to another.
*/
class ObjectFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataProcessor is the filter interface; this one seems more like a DataProcessorFactory maybe?

Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@mdw-at-linuxbox
Copy link
Contributor Author

I've pushed a minor update: this takes care of the loose ends, builds, and in very sketchy preliminary testing, seems to do something possibly correct. I've not addressed any of the issues people raised above yet.

Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jun 18, 2024
@cbodley cbodley removed the stale label Jul 10, 2024
@cbodley
Copy link
Contributor

cbodley commented Jul 22, 2024

ping @mdw-at-linuxbox

If another bug tells the compression filter to decompress more
data than is actually present, the resulting "end_of_buffer"
error was thrown.  The thrown exception unwinds the stack,
including a completion that is pending.  The resulting core dump
indicates a failure with this completion rather than the end of buffer
exception, which is misleading and not useful.

With this change, radosgw does not abort, and instead logs
a somewhat useful message before returning an "unknown" error
to the client.

Fixes: https://tracker.ceph.com/issues/23264

Signed-off-by: Marcus Watts <mwatts@redhat.com>
This contains code to allow copyobject to copy encrypted objects.

It includes additional data paths to communicate data from the
rest layer down to the sal layer to handle decrypting
objects.  The data paths include logic to use filter chains
from get and put that process encryption and compression.
There are several hacks to deal with quirks of the filter chains.
The "get" path has to propgate flushes around the chain,
because a flush isn't guaranteed to propagate through it.
Also the "get" and "put" chains have conflicting uses of the
buffer list logic, so the buffer list has to be copied so that
they don't step on each other's toes.

Fixes: https://tracker.ceph.com/issues/23264

Signed-off-by: Marcus Watts <mwatts@redhat.com>
This is after a suggestion by Matt Benjamin.

Not sure what caused this problem, but in this branch,
rgw_op.cc.o, which is in rgw_common.a, requires these symbols:
	ExternalTokenEngine::authenticate
	keystone::TokenEngine::authenticate
	rgw::auth::swift::SignedTokenEngine::authenticate
	vtable for EC2Engine
	vtable for TempURLApplier
	vtable for TempURLEngine
these are defined in the object files for these in rgw_a:
	rgw_swift_auth.cc
	rgw_auth_keystone.cc
Additionally, the swift changes also require,
	rgw_rest_swift.cc
So this change mvoes those files from rgw_a to rgw_common.

Fixes: https://tracker.ceph.com/issues/23264

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Lifecycle transtion can copy objects to a different storage tier.
When this happens, since the object is repacked, the original
manifest is invalidated.  It is necessary to store a special
"parts_len" attribute to fix this.  There was code in PutObj
to handle this, but that was only used for multisite replication,
it is not used by the lifecycle transisiton code.  This fix
adds similar logic to the lifecycle transition code path to make the
same thing happen.

Fixes: https://tracker.ceph.com/issues/23264

Signed-off-by: Marcus Watts <mwatts@redhat.com>
While 'STANDARD' is a valid storage class, it is not supposed
to ever be returned when fetching an object.  This change suppresses
storing 'STANDARD' as the attribute value, so that objects
explicitly created with 'STANDARD' will in fact be indistinguishable
from those where it was implicitly set.

Fixes: https://tracker.ceph.com/issues/67786

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Etags in other parts of ceph are returned as quoted objects (""),
where they are returned as header fields in the response.
Comppleting a multipart upload returns the etag in the xml
output, and it turns out they should be quoted here as well.

There are clients in the field (such as R2) that depend on this behavior.

Fixes: https://tracker.ceph.com/issues/67688

Signed-off-by: Marcus Watts <mwatts@redhat.com>
When an object is copied, it should only be depending on data
in the request to determine the storage class, and if it is
not specified, it should default to 'STANDARD'.  In radosgw,
this means that this is another attribute (similar to encryption)
that should not be merged from the source object.

Fixes: https://tracker.ceph.com/issues/67787

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@mdw-at-linuxbox
Copy link
Contributor Author

I've just updated this PR with my current upstream fixes for copy object encryption. This will need rebasing to the latest "main" to make it current.

I also have a standalone python script to test this, which should be somehow folded into teuthology. That's not quite trivial because it needs some specific things set up in rgw (keystone, storage classes), so the teuthology tooling will need to do that to run the tests.

@github-actions github-actions bot removed the stale label Sep 25, 2024
@cbodley
Copy link
Contributor

cbodley commented Sep 26, 2024

I also have a standalone python script to test this, which should be somehow folded into teuthology. That's not quite trivial because it needs some specific things set up in rgw (keystone, storage classes), so the teuthology tooling will need to do that to run the tests.

ideally this stuff would go in s3-tests with the rest of our sse test coverage. s3-tests does support storage classes, and the rgw suite does know how to create/configure them (ex https://github.com/ceph/ceph/blob/main/qa/suites/rgw/verify/overrides.yaml#L18-L22)

in s3-tests, see test_lifecycle_transition() for an example that uses those storage classes: https://github.com/ceph/s3-tests/blob/master/s3tests_boto3/functional/test_s3.py#L9091-L9092

@clwluvw
Copy link
Member

clwluvw commented Oct 13, 2024

Hi @mdw-at-linuxbox - I've added some tests here ceph/s3-tests#595. Do you think that would be enough for this?

@cbodley
Copy link
Contributor

cbodley commented Nov 19, 2024

ping @mdw-at-linuxbox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants