-
Notifications
You must be signed in to change notification settings - Fork 844
Fixes crashes around OCSP with FetchSM #9672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
iocore/net/OCSPStapling.cc
Outdated
| fetch() | ||
| { | ||
| SCOPED_MUTEX_LOCK(lock, mutex, this_ethread()); | ||
| this->_result = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this:
wkaras ~/REPOS/TS
O$ grep -e '\Wfetch(' -e '\Wset_done(' -e '\Wset_error(' $(findsrc)
./include/tscore/ArgParser.h: void set_error(std::string e);
./iocore/net/OCSPStapling.cc: this->fetch();
./iocore/net/OCSPStapling.cc: req->set_done();
./iocore/net/OCSPStapling.cc: req->set_error();
./iocore/net/OCSPStapling.cc: fetch()
./iocore/net/OCSPStapling.cc: set_done()
./iocore/net/OCSPStapling.cc: set_error()
./plugins/experimental/inliner/cache.h: fetch(const std::string &k, A &&...a)
./plugins/experimental/slice/prefetch.cc: if (bg->fetch(data)) {
./plugins/experimental/slice/prefetch.cc:BgBlockFetch::fetch(Data *const data)
./plugins/experimental/slice/prefetch.h: bool fetch(Data *const data);
./plugins/lua/ts_lua_fetch.c: /* ts.fetch() */
./src/tscore/ArgParser.cc:ArgParser::set_error(std::string e)
./src/tscore/unit_tests/test_ArgParser.cc: parser2.set_error("error");
wkaras ~/REPOS/TS
O$
it's clear that fetch(), set_done() and set_error() are only called by event_handler(). So, the continuation mutex will already be locked when they are called. It also means these member functions should be private. I suggest you make them private as a part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In event_handler(), won't req always be the same as this? What's the point of it then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the functions private and removed unnecessary calls for ext_set/get_user_data.
So, the continuation mutex will already be locked when they are called.
This may be true, but I kept SCOPED_MUTEX_LOCK, which should be almost no-op if the lock is acquired. Because e->mutex and e->continuation->mutex can be different under some conditions and e->continuation->mutex is not locked in that case. It may not be the case for this, but I'd rather explicitly acquire the lock.
* asf/master: Fix ttmsh log field (apache#9722) Add CentOS to the required builds (apache#9721) Fix H3 transaction leak (apache#9714) Only need to include eventfd for native mode. This was using the wrong define anyway (apache#9711) Cleanup: remove ts::Buffer from LogField. (apache#9665) Cleanup: remove ts::Buffer from URL.cc (apache#9663) Check the calling thread of Ethread::schedule_local (apache#9691) build_h3_tools.sh: Remove an unneeded dir check (apache#9710) autoconf: Add lib flags for the quiche build test. (apache#9679) Remove deprecated debug output functions from 13 source files. (apache#9676) Changes for C++23 (apache#9703) QUIC: Add a unit tests to validate that the qlog file is generated (and no crashes) (apache#9668) libswoc: Update to 1.4.10 (apache#9700) Reload hosting.config on TASK thread (apache#9699) Changes for C++20 (apache#9701) Make io_uring or thread AIO modes a startup time decision (vs compile time) (apache#9630) Replace curl with proxy verifier in proxy protocol tests (apache#9684) Fix event queue corruption on PreWarmManager::reconfigure (apache#9692) Fixes crashes around OCSP with FetchSM (apache#9672)
We saw two kind of crashes while testing #9591 and #9624, and the both were triggered by OCSP request timeout. In a nutshell, the crashes are due to accessing unavailable continuations. Continuations become unavailable when OCSP requests timeout, but the status were not properly updated/checked.