Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

447 Consistently use KeyNotFoundException, DuplicateKeyException and PermissionException in Storages #584

Closed
wants to merge 4 commits into from

Conversation

qc00
Copy link
Contributor

@qc00 qc00 commented Jul 12, 2023

Error handling behaviour comparison

Storage Function Throws For situation Key in error msg Changed in this PR
LMDB Common lmdb::error transaction/low-level failure None No
write

DuplicateKeyException overwriting AtomKey One No
std::runtime_error other write failures One No
update KeyNotFoundException upsert=false on existing key All Added user-friendly msg
(Instead of "Composite: ")
read KeyNotFoundException key not found All No
key_exists - - - No
remove KeyNotFoundException keys already deleted All No
iterate (None) - - No
memory write util::check overwriting AtomKey One DuplicateKeyException
update util::check_rte upsert=false on existing key One KeyNotFoundException
read KeyNotFoundException   One No
key_exists - - - No
remove remove keys already deleted One
KeyNotFoundException
iterate - - -
No
Mongo Common Mongo may throw exceptions other than those below 
No
util::check / std::runtime_error Mongo "write concern" timed out One
Retryable
E_MONGO_BULK_OP_NO_REPLY
write
update
mongocxx::operation_exception Any other errors One E_MONGO_OP_FAILED with the server message
update util::check The update count does not match the input One KeyNotFoundException with user-friendly message
read KeyNotFoundException key not found One
No
KeyNotFoundException Any other error from Mongo One
E_MONGO_OP_FAILED with the server message
KeyNotFoundException Exceptions from our code/assertions -
Propagates the original exception
remove mongocxx::operation_exception Any error One
KeyNotFoundException, All.
Server messages are logged.
key_exists (Only logging) Any error other than key not found -
E_MONGO_OP_FAILED
iterate mongocxx::operation_exception Any error from Mongo None
No
S3 write
update
util::raise_rte
Any request failure
(But no handling for upsert=false in S3)
One handle_common_errors
read /
exists
KeyNotFoundException / Debug log and return false key not found All No
NO_SUCH_BUCKET All E_UNEXPECTED_S3_ERROR, One
INVALID_ACCESS_KEY_ID
/ Permission issue
All E_PERMISSION, One
UnexpectedS3ErrorException SIGNATURE_DOES_NOT_MATCH One E_PERMISSION
remove KeyNotFoundException Bucket exists but not key All No
(None) All other errors (no bucket, creds, network, etc.) - Also KeyNotFoundException
iterate (Only warning) Any error - Same as read

Related improvements

  • provide meaningful messages for mongo exceptions
  • Systematic tests for storage exceptions
  • Lift out storage fixtures with new abilities to simulate errors:
    • shutting down the fixture early
    • multiple users
    • changing bucket permissions
  • Expanding library_tool to expose all Storage methods
  • Fix minor bug in Segment c'tor

qc00 and others added 4 commits July 4, 2023 16:15
+ warn if AWS auth is being used and machine identity is disabled
+ Refactor S3 fixtures so various errors can be simulated
* Removed pytest-server-fixtures dependency which is a bit old and we only use function
Also provide meaningful messages for mongo exceptions
Copy link
Collaborator

@poodlewars poodlewars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! ⭐ ⭐ ⭐ ⭐ ⭐

Would be good to add tests verifying that the table in your PR description is correct.

@@ -215,7 +215,8 @@ void register_error_code_ecosystem(py::module& m, py::exception<arcticdb::Arctic

py::register_exception<SchemaException>(m, "SchemaException", compat_exception.ptr());
py::register_exception<NormalizationException>(m, "NormalizationException", compat_exception.ptr());
py::register_exception<StorageException>(m, "StorageException", compat_exception.ptr());
auto& storage_exception = py::register_exception<StorageException>(m, "StorageException", compat_exception.ptr());
py::register_exception<StorageRetryableException>(m, "StorageRetryableException", storage_exception.ptr());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any docs update needed? eg

## Exception Hierarchy

result.has_value(), "Mongo did not acknowledge write for key {}", kv.key_view());
if (!upsert && result->modified_count() == 0) {
throw KeyNotFoundException(std::move(kv.variant_key()),
"update_segment called with upsert=false on non-exist key: {}");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"update_segment called with upsert=false on non-exist key: {}");
"update_segment called with upsert=false on non-existent key: {}");

@@ -62,6 +64,7 @@ inline std::unordered_map<ErrorCategory, const char*> get_error_category_names()
ERROR_CODE(2003, E_INCOMPATIBLE_INDEX) \
ERROR_CODE(2004, E_WRONG_SHAPE) \
ERROR_CODE(3000, E_NO_SUCH_VERSION) \
ERROR_CODE(3001, E_SYMBOL_NOT_FOUND) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for these, error_messages.md

cpp/arcticdb/toolbox/library_tool.hpp Show resolved Hide resolved
void LibraryTool::write(VariantKey key, Segment segment) {
storage::KeySegmentPair kv{std::move(key), std::move(segment)};
lib_->write(Composite<storage::KeySegmentPair>{std::move(kv)});
void LibraryTool::write_batch(std::vector<VariantKey> keys, std::vector<Segment> segments) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Might be friendlier to assert keys and segments are the same length

return; // Caller need to handle these
}

#define S3_ERROR_FMT " error while {} {}: S3Error#{} {}: {}", \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real need for this to be a macro is there? Just a normal function would be fine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


inline bool is_expected_error_type(Aws::S3::S3Errors err) {
return err == Aws::S3::S3Errors::NO_SUCH_KEY
|| err == Aws::S3::S3Errors::NO_SUCH_BUCKET
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've lost the special handling for NO_SUCH_BUCKET in this PR, I assume because you check it's accessible when the Arctic instance is created?

What if the bucket is deleted after the Arctic instance is created?


}
else {
const auto& error = list_objects_outcome.GetError();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a long time thinking about this one. Looks like we are similar to the old implementation in that we stop the paging after the first error. Difference is that now sometimes we throw. Are callers of this method able to cope with iterate type throwing?

if (!outcome.IsSuccess()) {
auto& error = outcome.GetError();

#define BUCKET_LOG(level, msg, ...) log::storage().level(msg "\nHTTP Status: {}. Server response: {}", \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No macro please

auto wait = std::chrono::milliseconds(ConfigsMap::instance()->get_int("S3Storage.CheckBucketMaxWait", 1000));
if (future.wait_for(wait) == std::future_status::ready) {
auto outcome = future.get();
if (!outcome.IsSuccess()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (outcome.IsSuccess()) {
  return true;
}
// log some stuff
return false;

is clearer IMO. There's only one way we can return true here and the code could make that more obvious.

@@ -50,6 +50,7 @@ inline void LmdbStorage::do_write_internal(Composite<KeySegmentPair>&& kvs, ::lm
ARCTICDB_SUBSAMPLE(LmdbPut, 0)
int res = ::mdb_put(txn.handle(), dbi.handle(), &mdb_key, &mdb_val, MDB_RESERVE | overwrite_flag);
if (res == MDB_KEYEXIST) {
// Since LMDB is in-memory, we can efficiently detect: (see its doc for details)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can quite follow this comment. What can we efficiently detect?

@@ -85,15 +86,16 @@ inline void LmdbStorage::do_update(Composite<KeySegmentPair>&& kvs, UpdateOpts o
if(!failed_deletes.empty()) {
ARCTICDB_SUBSAMPLE(LmdbStorageCommit, 0)
txn.commit();
throw KeyNotFoundException(Composite<VariantKey>(std::move(failed_deletes)));
throw KeyNotFoundException(Composite<VariantKey>(std::move(failed_deletes)),
"do_update called with upsert=false on non-exist key(s): {}");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"do_update called with upsert=false on non-exist key(s): {}");
"do_update called with upsert=false on non-existent key(s): {}");

util::check_rte(opts.upsert_ || it != key_vec.end(), "update called with upsert=false but key does not exist");
if (!opts.upsert_ && it == key_vec.end()) {
throw KeyNotFoundException(std::move(kv.variant_key()),
"update called with upsert=false but key does not exist: {}");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is different than the error message for LMDB. I think we should unify them.

(Minor) I wouldn't move kv.variant_key() but copy it. We're already doing the slow thing throwing, so I don't see any benefit of moving a string. If we don't move the debugging will be slightly better as the key will exist in the stack trace.

* https://github.com/mongodb/mongo-cxx-driver/blob/master/src/mongocxx/exception/error_code.hpp for constants
*/

/** Mongo often fail to set the what() string, so has to manually format the exception */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Mongo often fail to set the what() string, so has to manually format the exception */
/** Mongo often fails to set the what() string, so has to manually format the exception */


/** Mongo often fail to set the what() string, so has to manually format the exception */
[[noreturn]] void translate_operation_exception(const mongocxx::operation_exception& e) {
std::string json;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor) Isn't the naming slightly misleading? As far as I understand bsoncxx::to_json(e.raw_server_error()->view()); would return a stringified JSON, but fmt::format("(Failed to format server response either: {})", e.what()); "No response from server"; are not JSONs.

util::check(bool(result), "Mongo error while putting key {}", kv.key_view());
try {
auto result = bulk_write.execute();
// We don't use the "mongoc_write_concern_set_w" that allows nullopt to be returned, but check it anyway:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to check it? Maybe a comment on why we check it anyway.

Comment on lines +365 to +366
if(StorageFailureSimulator::instance()->configured())
StorageFailureSimulator::instance()->go(FailureType::READ);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Personal preference) I believe that wrapping if/else/for/while bodies in {} even for one liners is more readable.

auto size = doc["total_size"].get_int64().value;
entity::VariantKey stored_key{ detail::variant_key_from_document(doc, key) };
entity::VariantKey stored_key{detail::variant_key_from_document(doc, key)};
util::check(stored_key == key, "Key mismatch: {} != {}");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general question not related to this PR in particular. Do we want to leak E_ASSERTION_FAILURE exceptions? AFAIK we use this in places where people would use cassert's function assert so do we want people to be able to see this in "raw" form and why don't we use casserts in general?

catch(const std::exception& ex) {
log::storage().info("Segment read error: {}", ex.what());
throw storage::KeyNotFoundException{Composite<VariantKey>{VariantKey{key}}};
} catch(const mongocxx::operation_exception& e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not catching std::exception anymore. What was the thing that used to throw anything inheriting from std::exception?

} else {
result = collection.delete_one(document{} << "key" << fmt::format("{}", key) << "stream_id" <<
fmt::format("{}", variant_key_id(key)) << finalize);
auto filter = document{} << "key" << fmt::format("{}", key) << "stream_id" << fmt::format("{}", variant_key_id(key))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor) I think one large fmt::format (if possible) would be easier to read

} else {
log::storage().info("Unable to determine if the bucket is accessible. Request timed out.");
}
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does false stand for? It's no quite obvious to me from the name of the function?

void LibraryTool::write_batch(std::vector<VariantKey> keys, std::vector<Segment> segments) {
Composite<storage::KeySegmentPair> composite;
for (size_t i = 0; i < keys.size(); i++) {
composite.push_back({std::move(keys[i]), std::move(segments.at(i))});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this performance critical? Calling at is slightly slower due to bounds checks. Do we want it to throw? Who's supposed to handle it in case of out_of_bounds exception?

@@ -195,6 +197,14 @@ def configure_test_logger(level="INFO"):
configure(get_test_logger_config(level=level, outputs=outputs), force=True)


def gracefully_terminate_process(p):
p.terminate()
if sys.platform != "win32":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about windows?
Is sigkill considered as graceful termination seems rather violent to me?

@poodlewars poodlewars added the enhancement New feature or request label Dec 29, 2023
muhammadhamzasajjad added a commit that referenced this pull request Feb 2, 2024
Implementation of lmdb storage specific exceptions. This PR is a
refactor of #584 which was not merged because it was too complex.

This PR also implements a framework for testing storage exceptions.
`GenericStorageTest` class allows testing of common exceptions which are
triggered among all the storages. Some examples of these exceptions are
`KeyNotFoundException` if you read an in-existent key,
`DuplicateKeyException` if you attempt to overwrite a key that already
exists.

`lmdb::map_full_error` is an example of exception specific to LMDB which
is triggered when the remaining map size is not enough to write new
keys/values.
muhammadhamzasajjad added a commit that referenced this pull request Feb 8, 2024
Implementation of memory storage specific exceptions normalization. This
PR is a refactor of #584.
muhammadhamzasajjad added a commit that referenced this pull request Feb 13, 2024
This is a refactor of #584 for S3 Storage Exceptions. This PR also
implements the tests for S3 storage exceptions. Note that in some cases
S3 Storage Exceptions are different from other storages. For instance if
you write a path that already exists, other storages throw
`DuplicateKeyException`. In this case, S3 just over-writes the key
without any errors, therefore no Exception is thrown.

The function `handle_s3_error` handles other S3 Storage specific
exceptions as follows:
- `E_PERMISSION`: raised when `ACCESS_DENIED`, `INVALID_ACCESS_KEY_ID`
or `SIGNATURE_DOES_NOT_MATCH` errors are returned.
- `E_S3_RETRYABLE`: raised for all the other S3 exceptions that are
retry-able.
- `E_UNEXPECTED_S3_ERROR`: raised for all the other S3 exceptions that
are not retry-able.

These have been implemented following the PR #584.

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>
muhammadhamzasajjad added a commit that referenced this pull request Feb 26, 2024
This PR performs normalization of Azure storage exceptions as per #584.
It throws exceptions using error code and http status code as per the
[azure error
docs](https://learn.microsoft.com/en-us/rest/api/storageservices/blob-service-error-codes).
It also does a refactor of the `test_storage_exceptions.cpp` using
functions from `common.hpp`. `PermissionException` is renamed as
`StoragePermissionException` as it was conflicting with another
exception.
muhammadhamzasajjad added a commit that referenced this pull request Feb 28, 2024
This performs normalization of RocksDB storage exceptions as per
#584.
muhammadhamzasajjad added a commit that referenced this pull request Mar 11, 2024
This normalizes mongo exceptions as in #584. The logic for exception
normalization is now moved to `mongo_storage.cpp` as we also support a
`MockMongoClient` which can mock failures. The Permission exceptions are
also normalized. Previously there were two types of permission
exceptions. One was triggered when a storage API returned a permission
failure, the second was triggered if an unpermitted storage operation
was attempted as per the library `OpenMode`. The two permission have now
been normalized to inherit from the same class.

Furthermore, the exception hierarchy in python has also been changed.
`DuplicateKeyException` inherits from `StorageException` in C++, this
should be reflected on python side too. Why? because if someone wanted
they could catch all the storage exceptions as follows instead of having
to catch `PermissionException`, `DuplicateKeyException` and
`StorageException` separately:

```python
try:
  # an operation that calls storage from python
except StorageException:
  # log storage error
```

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->
@poodlewars poodlewars closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants