Skip to content

Conversation

@maskit
Copy link
Member

@maskit maskit commented May 2, 2023

We saw two kind of crashes while testing #9591 and #9624, and the both were triggered by OCSP request timeout. In a nutshell, the crashes are due to accessing unavailable continuations. Continuations become unavailable when OCSP requests timeout, but the status were not properly updated/checked.

@maskit maskit added this to the 10.0.0 milestone May 2, 2023
@maskit maskit self-assigned this May 2, 2023
@bryancall bryancall requested a review from ywkaras May 8, 2023 22:16
fetch()
{
SCOPED_MUTEX_LOCK(lock, mutex, this_ethread());
this->_result = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this:

wkaras ~/REPOS/TS
O$ grep -e '\Wfetch(' -e '\Wset_done(' -e '\Wset_error(' $(findsrc)
./include/tscore/ArgParser.h:  void set_error(std::string e);
./iocore/net/OCSPStapling.cc:      this->fetch();
./iocore/net/OCSPStapling.cc:        req->set_done();
./iocore/net/OCSPStapling.cc:        req->set_error();
./iocore/net/OCSPStapling.cc:  fetch()
./iocore/net/OCSPStapling.cc:  set_done()
./iocore/net/OCSPStapling.cc:  set_error()
./plugins/experimental/inliner/cache.h:  fetch(const std::string &k, A &&...a)
./plugins/experimental/slice/prefetch.cc:  if (bg->fetch(data)) {
./plugins/experimental/slice/prefetch.cc:BgBlockFetch::fetch(Data *const data)
./plugins/experimental/slice/prefetch.h:  bool fetch(Data *const data);
./plugins/lua/ts_lua_fetch.c:  /* ts.fetch() */
./src/tscore/ArgParser.cc:ArgParser::set_error(std::string e)
./src/tscore/unit_tests/test_ArgParser.cc:  parser2.set_error("error");
wkaras ~/REPOS/TS
O$

it's clear that fetch(), set_done() and set_error() are only called by event_handler(). So, the continuation mutex will already be locked when they are called. It also means these member functions should be private. I suggest you make them private as a part of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In event_handler(), won't req always be the same as this? What's the point of it then?

Copy link
Member Author

@maskit maskit May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the functions private and removed unnecessary calls for ext_set/get_user_data.

So, the continuation mutex will already be locked when they are called.

This may be true, but I kept SCOPED_MUTEX_LOCK, which should be almost no-op if the lock is acquired. Because e->mutex and e->continuation->mutex can be different under some conditions and e->continuation->mutex is not locked in that case. It may not be the case for this, but I'd rather explicitly acquire the lock.

@maskit maskit merged commit 948f80b into apache:master May 10, 2023
cmcfarlen pushed a commit to cmcfarlen/trafficserver that referenced this pull request Jun 3, 2024
* asf/master:
  Fix ttmsh log field (apache#9722)
  Add CentOS to the required builds (apache#9721)
  Fix H3 transaction leak (apache#9714)
  Only need to include eventfd for native mode. This was using the wrong define anyway (apache#9711)
  Cleanup: remove ts::Buffer from LogField. (apache#9665)
  Cleanup: remove ts::Buffer from URL.cc (apache#9663)
  Check the calling thread of Ethread::schedule_local (apache#9691)
  build_h3_tools.sh: Remove an unneeded dir check (apache#9710)
  autoconf: Add lib flags for the quiche build test. (apache#9679)
  Remove deprecated debug output functions from 13 source files. (apache#9676)
  Changes for C++23 (apache#9703)
  QUIC: Add a unit tests to validate that the qlog file is generated (and no crashes) (apache#9668)
  libswoc: Update to 1.4.10 (apache#9700)
  Reload hosting.config on TASK thread (apache#9699)
  Changes for C++20 (apache#9701)
  Make io_uring or thread AIO modes a startup time decision (vs compile time) (apache#9630)
  Replace curl with proxy verifier in proxy protocol tests (apache#9684)
  Fix event queue corruption on PreWarmManager::reconfigure (apache#9692)
  Fixes crashes around OCSP with FetchSM (apache#9672)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants