Initial implementation of C api #915

erer1243 · 2024-09-12T17:28:39Z

Implement a C interface to some of libswsscommon in support of sonic-dash-ha.

Related:
sonic-net/sonic-dash-ha#6
#921

Incoming follow up PR:
erer1243#1

linux-foundation-easycla · 2024-09-12T17:28:43Z

The committers listed above are authorized under a signed CLA.

✅ login: erer1243 (40b04e8, 01d30a0, 268e71c, 32a8131, 5bec389, cac3263, e9a12fd, 5403472, f188ab3)

common/c-api/util.h

common/Makefile.am

common/zmqconsumerstatetable.cpp

common/c-api/consumerstatetable.cpp

common/table.h

common/zmqconsumerstatetable.cpp

qiluo-msft · 2024-10-15T06:19:57Z

common/c-api/util.h

+        } catch (std::exception & e) {                                                             \
+            std::cerr << "Aborting due to exception: " << e.what() << std::endl;                   \
+            SWSS_LOG_ERROR("Aborting due to exception: %s", e.what());                             \
+            std::abort();                                                                          \


abort

If I understand correctly, all exception in this repo will abort an rust application. This is behavior change. Could you keep original behavior?

Unfortunately this is not possible. It is UB to allow a c++ exception to unwind into rust, so there are two options: 1. catch exceptions and convert into normal data types which can be returned to rust to recover from, or 2. abort on exception. Option 1 was decided against because all thrown exceptions are basically fatal (out of memory, invalid config file, etc). Option 2 also gives us a core dump to debug against. If we were to ignore the exceptions, rust would abort automatically anyway but with a bad core dump.

Thanks for the explanation!

all thrown exceptions are basically fatal -> I do not agree. Some exceptions are related to runtime env for example a redis server could not respond right now but may respond after retry. ref:

sonic-swss-common/common/dbconnector.cpp

Line 564 in 0044540

throw system_error(make_error_code(errc::address_not_available),

Hm, I understand and that is a good point, but now we have to convert errors into an ffi-safe behavior. This leaves us with a couple of non-ideal options.

Option 1:

Enumerate all non-fatal exceptions and code those into the signature of the C apis.
Eg: we would change SWSSDBConnector_new_tcp to something like this

// If this function returns null, db did not respond, but it might if you retry SWSSDBConnector SWSSDBConnector_new_tcp(int32_t dbId, const char *hostname, uint16_t port, uint32_t timeout) { SWSSTry({ try { return (SWSSDBConnector) new DBConnector(dbId, string(hostname), port, timeout); } catch (system_error &e) { // we know this error is nonfatal so we hard coded it in here return nullptr; } // any other error will still abort() because any other error must be fatal }); }

This is not ideal because now the C api becomes logically coupled to the implementation of the underlying function. If another nonfatal exception is added to DBConnector, they also have to change this C function to properly return null. We will have to do this for every possible nonfatal exception that one of the C functions might cause.

This may be OK if there are very few nonfatal exceptions, we expect to practically never add more, and if the C api stays very small. Otherwise, anyone who works on swsscommon now also has to understand these implications on the C api.

Option 2 simple:

Introduce a generic error interface to the C api. This is a very bad way of doing it, but I mention it because this is how I previously implemented the C api. We store errors as a global error string, and SWSSTry + code using it looks something like this:

#include <cstring> // for strdup #include <cstdlib> // for free const char *globalErrorString = nullptr; extern "C" { const char *SWSSGetError(void) { return globalErrorString; } } #define SWSSTry(failure_value, ...) \ try { \ __VA_ARGS__ \ ; \ } catch (std::exception &e) { \ free(globalErrorString); \ globalErrorString = strdup(e.what().c_str()); \ return failure_value; \ } SWSSDBConnector SWSSDBConnector_new_tcp(int32_t dbId, const char *hostname, uint16_t port, uint32_t timeout) { SWSSTry(nullptr, { // upon any exception, returns nullptr and globalErrorString is set to e.what() return (SWSSDBConnector) new DBConnector(dbId, string(hostname), port, timeout) }); }

On the rust (or any other language) side:

fn make_a_dbconnector() { loop { let dbconnector = unsafe { SWSSDBConnector_new_tcp(...) }; if dbconnector.is_null() { // If any error occurred fatal or nonfatal, all we get is a descriptive string let error_str = unsafe { str::from_utf8_unchecked(SWSSGetError()) }; // Magic strings!! yuck!! if error_str.contains("address not available") { // nonfatal error - retry later sleep_10_seconds(); } else { panic!("fatal error: {error_str}"); } } else { return dbconnector; } } }

I originally wrote the api this way, but it was nixed by Riff (#915 (comment)) because we are losing the core dump/stack trace for fatal errors. Another problem is that all we get is a string - if we want to catch specific exceptions, we have to know what e.what() will return. This is really bad because foreign code is logically coupled to the exact exception messages in swss-common. If somebody fixes a typo in an exception message, that the magic string in foreign code is now wrong!

Option 2 advanced:

Here's another approach to a generic error interface that may be less bad. Introduce a generic try/catch to the C interface, like so:

bool globalDisableAborts = false; const char *globalErrorString = nullptr; extern "C" { void SWSSTry_(void) { free(globalErrorString); globalErrorString = nullptr; globalDisableAborts = true; } const char *SWSSCatch(void) { globalDisableAborts = false; return globalErrorString; } } #define SWSSTry(failure_value, ...) \ try { \ __VA_ARGS__ \ ; \ } catch (std::exception &e) { \ if (globalDisableAborts) { free(globalErrorString); \ globalErrorString = strdup(e.what().c_str()); \ return failure_value; \ } else { \ SWSS_LOG_ERROR(... e.what() ...); \ abort(); \ } \ } ```rs fn make_a_dbconnector_retrying() { // We will only try 10 times before performing an aborting call for tries in 0..10 { // begin a pseudo try block unsafe { SWSSTry_() }; let dbconnector = unsafe { SWSSDBConnector_new_tcp(...) }; // end the pseudo try block, get the error if one happened. let err = unsafe { SWSSCatch() }; if err.is_null() { // happy path: error is null so there was no error - dbconnector is valid return dbconnector; } else { // This might've been a fatal or nonfatal error but we aren't checking with a magic string. // We don't need to check because we are limiting our number of tries. let error_str = unsafe { str::from_utf8_unchecked(SWSSGetError()) }; println!("error: {error_str}, retrying"); sleep_10_seconds(); } } // We have tried 10 times - it's probably a fatal error, so let's bite the bullet and do an aborting call // Since we did not use SWSSTry_, any exception will still abort. Just like current behavior. // We will get a nice core dump because abort() was called; let dbconnector = unsafe { SWSSDBConnector_new_tcp(...) }; return dbconnector; }

This is the best idea IMO because we can choose where to abort and where not to. Using SWSSTry_ and SWSSCatch we can retry something a few times before electing to lay down our cards and get a proper core dump. In simple cases, where any error is fatal, we can simply ignore SWSSTry_ and SWSSCatch, and use aborting behavior on the first try.

Final thoughts

If there are very very few nonfatal errors in code that the C api uses, and we can find all of them and hard code their nonfatality into the C api functions, then we should do option 1.

If the C api is going to grow a lot later, or swss-common might change what errors are nonfatal, or we really want the C api to be very simple, or we want to allow people working on swss-common to ignore the C api, or we want to give generic error handling decisions to downstream foreign code, I think we should do option 2 advanced.

Even with this it still creates challenge in the Humberger stack, where c++ -> rust -> c++, because with return code approach, every function in rust need to understand the error code and passing it to the upper layer. This limits the error handling in rust to only Anyhow-ish crates with downcast...

This is a problem that me and Oliver struggled for days on finding the solutions, but so far there is no obvious/quick answer to it. So I have discussed this issue with Qi offline. We will need to fix this issue, but since it is only impacting the services that uses rust in the exception handling path, we are going to create a backlog to get this fixed, and get the current PR merged to unblock our immediate project progress.

this issue is fundamentally caused by rust not having try-catch support, and we are trying to recreate the try-catch support for the language, which will take time to get it right.

Work Item created here: MSADO:29998324 and assigned to Oliver at this moment.

this issue is fundamentally caused by rust not having try-catch support, and we are trying to recreate the try-catch support for the language, which will take time to get it right.

Rust panics use the same mechanism as c++ exceptions - it really does have try/catch, but the issue is that no C ABIs permit unwinding. Technically it would even be incorrect if we called these C api functions from within another C++ program, if we crossed a dynamic-linking boundary (i.e. the "hamburger stack" would look like C++ -> C -> C++). This is because internal ABI details are negotiated per linkage unit (unless magic is applied like rust's C-unwind). If we had a C++ function that called a C function, it's valid to optimize out any unwind-catching code because, by spec, no C function can ever unwind. Rust is also making this assumption - that no C function can ever unwind.

Personally I think we should either implement the "option 2 advanced" thing I wrote, or just ignore this for now until catching specific errors becomes necessary in hamgrd or other services.

tests/c_api_ut.cpp

qiluo-msft · 2024-10-22T19:01:38Z

@erer1243 In future could you reduce the force merge during PR iteration? We are using tool to compare between commits to understand what is newly added, and it improve code review efficiency.

r12f

With the backlog created and sync'ed with Qi, approving this PR, so we can get it merged.

erer1243 · 2024-10-23T23:58:45Z

Sorry, forgot to include the c api stuff in the deb packages :)

Implement rust wrapper of the new C api on libswsscommon Related: sonic-net/sonic-swss-common#915 --------- Co-authored-by: erer1243 <erer1243@users.noreply.github.com>

r12f · 2024-10-30T00:22:51Z

sync'ed with Qi. all comments are resolved now, hence merging it.

r12f self-requested a review September 12, 2024 18:59

erer1243 force-pushed the c-interface branch from bc232be to f621b2a Compare September 12, 2024 20:09

r12f reviewed Sep 13, 2024

View reviewed changes

common/c-api/util.h Show resolved Hide resolved

common/c-api/util.h Outdated Show resolved Hide resolved

erer1243 force-pushed the c-interface branch from 4dbe2da to cf69f20 Compare September 23, 2024 14:05

This was referenced Sep 25, 2024

Add BinarySerializer function that uses a thread-local shared buffer, and fix errors #921

Closed

Initial implementation of swss-common sonic-net/sonic-dash-ha#6

Merged

r12f requested review from saiarcot895, qiluo-msft and liuh-80 September 26, 2024 16:30

liuh-80 reviewed Sep 27, 2024

View reviewed changes

common/Makefile.am Outdated Show resolved Hide resolved

liuh-80 reviewed Sep 27, 2024

View reviewed changes

common/zmqconsumerstatetable.cpp Outdated Show resolved Hide resolved

liuh-80 reviewed Sep 27, 2024

View reviewed changes

common/c-api/consumerstatetable.cpp Show resolved Hide resolved

erer1243 force-pushed the c-interface branch 3 times, most recently from 3dab797 to 74a5cc0 Compare September 30, 2024 17:23

r12f previously approved these changes Oct 1, 2024

View reviewed changes

erer1243 dismissed r12f’s stale review via bc1c92f October 1, 2024 21:42

r12f reviewed Oct 7, 2024

View reviewed changes

common/table.h Outdated Show resolved Hide resolved

erer1243 force-pushed the c-interface branch from dff84c6 to bb85a6b Compare October 10, 2024 18:57

qiluo-msft reviewed Oct 15, 2024

View reviewed changes

common/zmqconsumerstatetable.cpp Outdated Show resolved Hide resolved

qiluo-msft reviewed Oct 15, 2024

View reviewed changes

tests/c_api_ut.cpp Show resolved Hide resolved

erer1243 and others added 7 commits October 15, 2024 18:41

Initial implementation of C api

40b04e8

fix malloc size_t conversion

5bec389

add BinarySerializer::serializedSize to calculate required buffer length

01d30a0

implement more zmq c apis

268e71c

Add SonicDBConfig::initialize* to c-api

cac3263

Support timeouts greater than 1 second

f188ab3

Add c api unit tests

5403472

Remove c api ZmqMessageHandlers and other methods of dubious value

e9a12fd

erer1243 force-pushed the c-interface branch from bb85a6b to e9a12fd Compare October 15, 2024 19:14

liuh-80 previously approved these changes Oct 22, 2024

View reviewed changes

r12f approved these changes Oct 22, 2024

View reviewed changes

r12f previously approved these changes Oct 23, 2024

View reviewed changes

Add c api headers to dev package

32a8131

erer1243 dismissed stale reviews from r12f and liuh-80 via 32a8131 October 23, 2024 21:48

erer1243 mentioned this pull request Oct 25, 2024

Decide how to pass exceptions across the C API #932

Closed

r12f approved these changes Oct 27, 2024

View reviewed changes

r12f merged commit 45d7cb0 into sonic-net:master Oct 30, 2024
17 checks passed

erer1243 deleted the c-interface branch October 30, 2024 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of C api #915

Initial implementation of C api #915

erer1243 commented Sep 12, 2024 •

edited

Loading

linux-foundation-easycla bot commented Sep 12, 2024 •

edited

Loading

qiluo-msft Oct 15, 2024

erer1243 Oct 15, 2024

qiluo-msft Oct 22, 2024

erer1243 Oct 22, 2024 •

edited

Loading

r12f Oct 23, 2024

r12f Oct 23, 2024

r12f Oct 23, 2024

erer1243 Oct 24, 2024 •

edited

Loading

erer1243 Oct 24, 2024

qiluo-msft commented Oct 22, 2024

r12f left a comment

erer1243 commented Oct 23, 2024

r12f commented Oct 30, 2024

Initial implementation of C api #915

Initial implementation of C api #915

Conversation

erer1243 commented Sep 12, 2024 • edited Loading

linux-foundation-easycla bot commented Sep 12, 2024 • edited Loading

qiluo-msft Oct 15, 2024

Choose a reason for hiding this comment

erer1243 Oct 15, 2024

Choose a reason for hiding this comment

qiluo-msft Oct 22, 2024

Choose a reason for hiding this comment

erer1243 Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Option 1:

Option 2 simple:

Option 2 advanced:

Final thoughts

r12f Oct 23, 2024

Choose a reason for hiding this comment

r12f Oct 23, 2024

Choose a reason for hiding this comment

r12f Oct 23, 2024

Choose a reason for hiding this comment

erer1243 Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

erer1243 Oct 24, 2024

Choose a reason for hiding this comment

qiluo-msft commented Oct 22, 2024

r12f left a comment

Choose a reason for hiding this comment

erer1243 commented Oct 23, 2024

r12f commented Oct 30, 2024

erer1243 commented Sep 12, 2024 •

edited

Loading

linux-foundation-easycla bot commented Sep 12, 2024 •

edited

Loading

erer1243 Oct 22, 2024 •

edited

Loading

erer1243 Oct 24, 2024 •

edited

Loading