Skip to content

Conversation

@masaori335
Copy link
Contributor

@masaori335 masaori335 commented Dec 16, 2021

Fix #7865. We observed the crash loop on start-up with a bad disk.

Backtrace

[ 00 ] libpthread-2.17.so waitpid 
[ 01 ] traffic_server crash_logger_invoke(int, siginfo_t*, void*) ( Crash.cc:165 ) 
[ 02 ] libpthread-2.17.so 
[ 03 ] traffic_server CacheDisk::~CacheDisk() ( List.h:293 ) 
[ 04 ] traffic_server CacheDisk::~CacheDisk() ( CacheDisk.cc:112 ) 
[ 05 ] traffic_server CacheProcessor::diskInitialized() ( Cache.cc:767 ) 
[ 06 ] traffic_server CacheDisk::openStart(int, void*) ( CacheDisk.cc:213 ) 
[ 07 ] traffic_server AIOCallbackInternal::io_complete(int, void*) ( eventsystem/I_Continuation.h:160 ) 
[ 08 ] traffic_server EThread::process_event(Event*, int) ( I_Continuation.h:160 ) 
[ 09 ] traffic_server EThread::process_queue(Queue<Event, Event::Link_link>*, int*, int*) ( UnixEThread.cc:170 ) 
[ 10 ] traffic_server EThread::execute_regular() ( UnixEThread.cc:230 ) 
[ 11 ] traffic_server spawn_thread_internal(void*) ( Thread.cc:85 ) 
[ 12 ] libpthread-2.17.so start_thread 

Log

[ET_AIO 0:99] WARNING: cache disk operation failed READ -1 61
[ET_NET 9] WARNING: could not read disk header for disk /dev/disk/****: declaring disk bad
[ET_NET 9] WARNING: failed operation: READ (opcode=1), span: /dev/disk/**** (fd=99)

Conditions

  1. Some HDD on a box got broken
  2. (re)start ATS for some reason

Scenario

  1. On ET_NET thread, CacheDisk::open schedule AIO read to load disk header info into DiskHeader *header member.
  2. On ET_AIO thread, cache_op try to read the disk but got an error (errno=61). When this error happens, the given buffer (header) has a random value.
  3. On ET_NET thread, CacheDisk::openStart handles the callback and declares it's a bad disk because of the result. It deletes the CacheDisk object.
  4. The CacheDisk destructor refers to the header which has a random value on a loop and got heap-buffer-overflow.

Affected Version

8.1.x, 9.0.x, and master

@masaori335 masaori335 added this to the 10.0.0 milestone Dec 16, 2021
@masaori335 masaori335 self-assigned this Dec 16, 2021
@masaori335 masaori335 merged commit 61c0fcc into apache:master Dec 19, 2021
zwoop pushed a commit that referenced this pull request Jan 5, 2022
zwoop pushed a commit that referenced this pull request Jan 5, 2022
@zwoop
Copy link
Contributor

zwoop commented Jan 5, 2022

Cherry-picked to v9.1.x branch.
Cherry-picked to v9.2.x

@zwoop zwoop modified the milestones: 10.0.0, 9.1.2 Jan 5, 2022
@zwoop zwoop added the 9.2.0 label Jan 5, 2022
@ezelkow1
Copy link
Member

ezelkow1 commented Feb 7, 2022

If we want this for 8.x can we get a backport?

masaori335 added a commit to masaori335/trafficserver that referenced this pull request Feb 7, 2022
@masaori335
Copy link
Contributor Author

@ezelkow1 Here you're > #8653

ezelkow1 pushed a commit that referenced this pull request Feb 7, 2022
moonchen pushed a commit to moonchen/trafficserver that referenced this pull request Mar 17, 2022
* asf/9.2.x:
  Updated ChangeLog
  docs: fix fedora install notes and spelling issues (apache#8537)
  Docs: Fix default value of proxy.config.ssl.handshake_timeout_in (apache#8574)
  Partial of revert "Cleanup generated LDFLAGS for jemalloc (apache#8285)" (apache#8533)
  TSUserArg: add value type checking (apache#8550)
  Relax key validation of sni.yaml (apache#8549)
  Clear random header value by AIO read error (apache#8559)
  Fixes macOS arm64 builds (again) (apache#8556)
  Traffic Dump: Use the correct transaction user index (apache#8548)
  combo_handler: Initialize User Arg Index in TSRemapInit (apache#8551)
  backout down parent retry limiting in parent selection and nexthop (apache#8546)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SEGV on CacheDisk destructor

5 participants