Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide how to pass exceptions across the C API #932

Closed
erer1243 opened this issue Oct 25, 2024 · 0 comments · Fixed by #967
Closed

Decide how to pass exceptions across the C API #932

erer1243 opened this issue Oct 25, 2024 · 0 comments · Fixed by #967

Comments

@erer1243
Copy link
Contributor

erer1243 commented Oct 25, 2024

Currently the C API (implemented in #915) calls abort() if any exception is thrown in the underlying C++ functions called. This is not ideal because some exceptions are non fatal and should be able to be handled (eg DBConnector throwing an exception when the database is not yet online, but it might come online later). Since it is not permitted to unwind through a C ABI function, we must decide on an appropriate method for passing exceptions across the ffi boundary.

Here are a few options and thoughts on the matter. Opinions welcome, as internal discussion hasn't come to an agreement.

Hard coding non-fatal exceptions

We could enumerate all non-fatal exceptions and code those into the signature of the C apis.
Eg: we would change SWSSDBConnector_new_tcp to something like this

// If this function returns null, db did not respond, but it might if you retry
SWSSDBConnector SWSSDBConnector_new_tcp(int32_t dbId, const char *hostname, uint16_t port, uint32_t timeout) {
    SWSSTry({
        try {
            return (SWSSDBConnector) new DBConnector(dbId, string(hostname), port, timeout);
        } catch (system_error &e) {
            // we know this error is nonfatal so we hard coded it in here
            return nullptr;
        }

        // any other error will still abort() because any other error must be fatal
    });
}

This is not ideal because now the C api becomes logically coupled to the implementation of the underlying function. If another nonfatal exception is added to DBConnector, they also have to change this C function to properly return null. We will have to do this for every possible nonfatal exception that every C function could trigger.

This may be OK if there are very few nonfatal exceptions, we expect to practically never add more, and if the C api stays very small. Otherwise, anyone who works on swsscommon now also has to understand these implications on the C api. It is also definitely the cleanest and best if we do think we can get away with it.

Global error string, never abort

We could introduce a error interface to the C api that provides informational error strings instead of aborting. This is probably a bad way of doing it, but I mention it because this is how I previously implemented the C api. We store errors as a global error string, and SWSSTry + code using it looks something like this:

#include <cstring> // for strdup
#include <cstdlib> // for free

const char *globalErrorString = nullptr;

extern "C" { const char *SWSSGetError(void) { return globalErrorString; } }

#define SWSSTry(failure_value, ...) \
    try { \
        __VA_ARGS__ \
        ; \
    } catch (std::exception &e) { \
        free(globalErrorString); \
        globalErrorString = strdup(e.what().c_str()); \
        return failure_value; \
    }

SWSSDBConnector SWSSDBConnector_new_tcp(int32_t dbId, const char *hostname, uint16_t port, uint32_t timeout) {
    SWSSTry(nullptr, {
            // upon any exception, returns nullptr and globalErrorString is set to e.what()
            return (SWSSDBConnector) new DBConnector(dbId, string(hostname), port, timeout)
    });
}

On the rust (or any other language) side:

fn make_a_dbconnector() {
    loop {
        let dbconnector = unsafe { SWSSDBConnector_new_tcp(...) };
        if dbconnector.is_null() {
            // If any error occurred fatal or nonfatal, all we get is a descriptive string
            let error_str =  unsafe { str::from_utf8_unchecked(SWSSGetError()) };
            
            // Magic strings!! yuck!!
            if error_str.contains("address not available") {
                // nonfatal error - retry later
               sleep_10_seconds();
            } else {
                panic!("fatal error: {error_str}");
            }
        } else {
            return dbconnector;
        }
    }
}

I originally wrote the api this way, but it was nixed by Riff (#915 (comment)) because we are losing the core dump/stack trace for fatal errors. Another problem is that all we get is a string - if we want to catch specific exceptions, we have to know what e.what() will return. This is really bad because foreign code is logically coupled to the exact exception messages in swss-common. If somebody fixes a typo in an exception message, that the magic string in foreign code is now wrong!

Global error string, elective abort using pseudo-try/catch

Here's another approach to an error string interface that may be less bad. Introduce a generic try/catch to the C interface, like so:

bool globalDisableAborts = false;
const char *globalErrorString = nullptr;

extern "C" { 
    void SWSSTry_(void) { free(globalErrorString); globalErrorString = nullptr; globalDisableAborts = true; }
    const char *SWSSCatch(void) { globalDisableAborts = false; return globalErrorString; } 
}

#define SWSSTry(failure_value, ...) \
    try { \
        __VA_ARGS__ \
        ; \
    } catch (std::exception &e) { \
        if (globalDisableAborts) {
            free(globalErrorString); \
            globalErrorString = strdup(e.what().c_str()); \
            return failure_value; \
        } else { \
            SWSS_LOG_ERROR(... e.what() ...); \
            abort(); \
        } \
    }

```rs
fn make_a_dbconnector_retrying() {
    // We will only try 10 times before performing an aborting call
    for tries in 0..10 {
        // begin a pseudo try block
        unsafe { SWSSTry_() }; 
        let dbconnector = unsafe { SWSSDBConnector_new_tcp(...) };
        // end the pseudo try block, get the error if one happened.
        let err = unsafe { SWSSCatch() }; 
        if err.is_null() {
            // happy path: error is null so there was no error - dbconnector is valid
            return dbconnector;
        } else {
            // This might've been a fatal or nonfatal error but we aren't checking with a magic string. 
            // We don't need to check because we are limiting our number of tries.
            let error_str =  unsafe { str::from_utf8_unchecked(SWSSGetError()) };
            println!("error: {error_str}, retrying");
            sleep_10_seconds();
        }
    }

    // We have tried 10 times - it's probably a fatal error, so let's bite the bullet and do an aborting call
    // Since we did not use SWSSTry_, any exception will still abort. Just like current behavior.
    // We will get a nice core dump because abort() was called;
    let dbconnector = unsafe { SWSSDBConnector_new_tcp(...) };
    return dbconnector;
}

With this solution, we can choose where to abort and where not to. Using SWSSTry_ and SWSSCatch we can retry something a few times before electing to lay down our cards and get a proper core dump. In simple cases, where any error is fatal, we can elide SWSSTry_ and SWSSCatch, and use aborting behavior on the first try.

Reimplement C++ standard exceptions for better exception catching & matching?

Prior art of note in this discussion is SWIG's solution to exceptions, which is to implement and FFI safe version of every standard C++ exception. This seems like a good idea, but I am unclear on how much work it would be to maintain. This would make matching on caught exceptions somewhat less magical, since we can consistently match on exception class (assuming swsscommon never throws non-standard exception types).
https://github.com/swig/swig/blob/master/Lib/exception.i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant