-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
447 Consistently use KeyNotFoundException
, DuplicateKeyException
and PermissionException
in Storage
s
#584
Conversation
+ warn if AWS auth is being used and machine identity is disabled + Refactor S3 fixtures so various errors can be simulated * Removed pytest-server-fixtures dependency which is a bit old and we only use function
Also provide meaningful messages for mongo exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! ⭐ ⭐ ⭐ ⭐ ⭐
Would be good to add tests verifying that the table in your PR description is correct.
@@ -215,7 +215,8 @@ void register_error_code_ecosystem(py::module& m, py::exception<arcticdb::Arctic | |||
|
|||
py::register_exception<SchemaException>(m, "SchemaException", compat_exception.ptr()); | |||
py::register_exception<NormalizationException>(m, "NormalizationException", compat_exception.ptr()); | |||
py::register_exception<StorageException>(m, "StorageException", compat_exception.ptr()); | |||
auto& storage_exception = py::register_exception<StorageException>(m, "StorageException", compat_exception.ptr()); | |||
py::register_exception<StorageRetryableException>(m, "StorageRetryableException", storage_exception.ptr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any docs update needed? eg
ArcticDB/docs/mkdocs/docs/error_messages.md
Line 151 in 3330bf3
## Exception Hierarchy |
result.has_value(), "Mongo did not acknowledge write for key {}", kv.key_view()); | ||
if (!upsert && result->modified_count() == 0) { | ||
throw KeyNotFoundException(std::move(kv.variant_key()), | ||
"update_segment called with upsert=false on non-exist key: {}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"update_segment called with upsert=false on non-exist key: {}"); | |
"update_segment called with upsert=false on non-existent key: {}"); |
@@ -62,6 +64,7 @@ inline std::unordered_map<ErrorCategory, const char*> get_error_category_names() | |||
ERROR_CODE(2003, E_INCOMPATIBLE_INDEX) \ | |||
ERROR_CODE(2004, E_WRONG_SHAPE) \ | |||
ERROR_CODE(3000, E_NO_SUCH_VERSION) \ | |||
ERROR_CODE(3001, E_SYMBOL_NOT_FOUND) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for these, error_messages.md
void LibraryTool::write(VariantKey key, Segment segment) { | ||
storage::KeySegmentPair kv{std::move(key), std::move(segment)}; | ||
lib_->write(Composite<storage::KeySegmentPair>{std::move(kv)}); | ||
void LibraryTool::write_batch(std::vector<VariantKey> keys, std::vector<Segment> segments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Might be friendlier to assert keys and segments are the same length
return; // Caller need to handle these | ||
} | ||
|
||
#define S3_ERROR_FMT " error while {} {}: S3Error#{} {}: {}", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real need for this to be a macro is there? Just a normal function would be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
inline bool is_expected_error_type(Aws::S3::S3Errors err) { | ||
return err == Aws::S3::S3Errors::NO_SUCH_KEY | ||
|| err == Aws::S3::S3Errors::NO_SUCH_BUCKET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've lost the special handling for NO_SUCH_BUCKET
in this PR, I assume because you check it's accessible when the Arctic instance is created?
What if the bucket is deleted after the Arctic instance is created?
|
||
} | ||
else { | ||
const auto& error = list_objects_outcome.GetError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a long time thinking about this one. Looks like we are similar to the old implementation in that we stop the paging after the first error. Difference is that now sometimes we throw. Are callers of this method able to cope with iterate type throwing?
if (!outcome.IsSuccess()) { | ||
auto& error = outcome.GetError(); | ||
|
||
#define BUCKET_LOG(level, msg, ...) log::storage().level(msg "\nHTTP Status: {}. Server response: {}", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No macro please
auto wait = std::chrono::milliseconds(ConfigsMap::instance()->get_int("S3Storage.CheckBucketMaxWait", 1000)); | ||
if (future.wait_for(wait) == std::future_status::ready) { | ||
auto outcome = future.get(); | ||
if (!outcome.IsSuccess()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (outcome.IsSuccess()) {
return true;
}
// log some stuff
return false;
is clearer IMO. There's only one way we can return true here and the code could make that more obvious.
@@ -50,6 +50,7 @@ inline void LmdbStorage::do_write_internal(Composite<KeySegmentPair>&& kvs, ::lm | |||
ARCTICDB_SUBSAMPLE(LmdbPut, 0) | |||
int res = ::mdb_put(txn.handle(), dbi.handle(), &mdb_key, &mdb_val, MDB_RESERVE | overwrite_flag); | |||
if (res == MDB_KEYEXIST) { | |||
// Since LMDB is in-memory, we can efficiently detect: (see its doc for details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can quite follow this comment. What can we efficiently detect?
@@ -85,15 +86,16 @@ inline void LmdbStorage::do_update(Composite<KeySegmentPair>&& kvs, UpdateOpts o | |||
if(!failed_deletes.empty()) { | |||
ARCTICDB_SUBSAMPLE(LmdbStorageCommit, 0) | |||
txn.commit(); | |||
throw KeyNotFoundException(Composite<VariantKey>(std::move(failed_deletes))); | |||
throw KeyNotFoundException(Composite<VariantKey>(std::move(failed_deletes)), | |||
"do_update called with upsert=false on non-exist key(s): {}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"do_update called with upsert=false on non-exist key(s): {}"); | |
"do_update called with upsert=false on non-existent key(s): {}"); |
util::check_rte(opts.upsert_ || it != key_vec.end(), "update called with upsert=false but key does not exist"); | ||
if (!opts.upsert_ && it == key_vec.end()) { | ||
throw KeyNotFoundException(std::move(kv.variant_key()), | ||
"update called with upsert=false but key does not exist: {}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is different than the error message for LMDB. I think we should unify them.
(Minor) I wouldn't move kv.variant_key()
but copy it. We're already doing the slow thing throwing, so I don't see any benefit of moving a string. If we don't move the debugging will be slightly better as the key will exist in the stack trace.
* https://github.com/mongodb/mongo-cxx-driver/blob/master/src/mongocxx/exception/error_code.hpp for constants | ||
*/ | ||
|
||
/** Mongo often fail to set the what() string, so has to manually format the exception */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** Mongo often fail to set the what() string, so has to manually format the exception */ | |
/** Mongo often fails to set the what() string, so has to manually format the exception */ |
|
||
/** Mongo often fail to set the what() string, so has to manually format the exception */ | ||
[[noreturn]] void translate_operation_exception(const mongocxx::operation_exception& e) { | ||
std::string json; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Minor) Isn't the naming slightly misleading? As far as I understand bsoncxx::to_json(e.raw_server_error()->view());
would return a stringified JSON, but fmt::format("(Failed to format server response either: {})", e.what());
"No response from server";
are not JSONs.
util::check(bool(result), "Mongo error while putting key {}", kv.key_view()); | ||
try { | ||
auto result = bulk_write.execute(); | ||
// We don't use the "mongoc_write_concern_set_w" that allows nullopt to be returned, but check it anyway: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to check it? Maybe a comment on why we check it anyway.
if(StorageFailureSimulator::instance()->configured()) | ||
StorageFailureSimulator::instance()->go(FailureType::READ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Personal preference) I believe that wrapping if/else/for/while bodies in {}
even for one liners is more readable.
auto size = doc["total_size"].get_int64().value; | ||
entity::VariantKey stored_key{ detail::variant_key_from_document(doc, key) }; | ||
entity::VariantKey stored_key{detail::variant_key_from_document(doc, key)}; | ||
util::check(stored_key == key, "Key mismatch: {} != {}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general question not related to this PR in particular. Do we want to leak E_ASSERTION_FAILURE
exceptions? AFAIK we use this in places where people would use cassert's function assert
so do we want people to be able to see this in "raw" form and why don't we use casserts in general?
catch(const std::exception& ex) { | ||
log::storage().info("Segment read error: {}", ex.what()); | ||
throw storage::KeyNotFoundException{Composite<VariantKey>{VariantKey{key}}}; | ||
} catch(const mongocxx::operation_exception& e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not catching std::exception
anymore. What was the thing that used to throw anything inheriting from std::exception
?
} else { | ||
result = collection.delete_one(document{} << "key" << fmt::format("{}", key) << "stream_id" << | ||
fmt::format("{}", variant_key_id(key)) << finalize); | ||
auto filter = document{} << "key" << fmt::format("{}", key) << "stream_id" << fmt::format("{}", variant_key_id(key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Minor) I think one large fmt::format
(if possible) would be easier to read
} else { | ||
log::storage().info("Unable to determine if the bucket is accessible. Request timed out."); | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does false
stand for? It's no quite obvious to me from the name of the function?
void LibraryTool::write_batch(std::vector<VariantKey> keys, std::vector<Segment> segments) { | ||
Composite<storage::KeySegmentPair> composite; | ||
for (size_t i = 0; i < keys.size(); i++) { | ||
composite.push_back({std::move(keys[i]), std::move(segments.at(i))}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this performance critical? Calling at
is slightly slower due to bounds checks. Do we want it to throw? Who's supposed to handle it in case of out_of_bounds
exception?
@@ -195,6 +197,14 @@ def configure_test_logger(level="INFO"): | |||
configure(get_test_logger_config(level=level, outputs=outputs), force=True) | |||
|
|||
|
|||
def gracefully_terminate_process(p): | |||
p.terminate() | |||
if sys.platform != "win32": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about windows?
Is sigkill considered as graceful termination seems rather violent to me?
Implementation of lmdb storage specific exceptions. This PR is a refactor of #584 which was not merged because it was too complex. This PR also implements a framework for testing storage exceptions. `GenericStorageTest` class allows testing of common exceptions which are triggered among all the storages. Some examples of these exceptions are `KeyNotFoundException` if you read an in-existent key, `DuplicateKeyException` if you attempt to overwrite a key that already exists. `lmdb::map_full_error` is an example of exception specific to LMDB which is triggered when the remaining map size is not enough to write new keys/values.
This is a refactor of #584 for S3 Storage Exceptions. This PR also implements the tests for S3 storage exceptions. Note that in some cases S3 Storage Exceptions are different from other storages. For instance if you write a path that already exists, other storages throw `DuplicateKeyException`. In this case, S3 just over-writes the key without any errors, therefore no Exception is thrown. The function `handle_s3_error` handles other S3 Storage specific exceptions as follows: - `E_PERMISSION`: raised when `ACCESS_DENIED`, `INVALID_ACCESS_KEY_ID` or `SIGNATURE_DOES_NOT_MATCH` errors are returned. - `E_S3_RETRYABLE`: raised for all the other S3 exceptions that are retry-able. - `E_UNEXPECTED_S3_ERROR`: raised for all the other S3 exceptions that are not retry-able. These have been implemented following the PR #584. #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details>
This PR performs normalization of Azure storage exceptions as per #584. It throws exceptions using error code and http status code as per the [azure error docs](https://learn.microsoft.com/en-us/rest/api/storageservices/blob-service-error-codes). It also does a refactor of the `test_storage_exceptions.cpp` using functions from `common.hpp`. `PermissionException` is renamed as `StoragePermissionException` as it was conflicting with another exception.
This normalizes mongo exceptions as in #584. The logic for exception normalization is now moved to `mongo_storage.cpp` as we also support a `MockMongoClient` which can mock failures. The Permission exceptions are also normalized. Previously there were two types of permission exceptions. One was triggered when a storage API returned a permission failure, the second was triggered if an unpermitted storage operation was attempted as per the library `OpenMode`. The two permission have now been normalized to inherit from the same class. Furthermore, the exception hierarchy in python has also been changed. `DuplicateKeyException` inherits from `StorageException` in C++, this should be reflected on python side too. Why? because if someone wanted they could catch all the storage exceptions as follows instead of having to catch `PermissionException`, `DuplicateKeyException` and `StorageException` separately: ```python try: # an operation that calls storage from python except StorageException: # log storage error ``` #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing -->
Error handling behaviour comparison
(Instead of "Composite: ")
E_MONGO_BULK_OP_NO_REPLY
update
Server messages are logged.
update
(But no handling for upsert=false in S3)
exists
/ Permission issue
Related improvements