Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProxySQL Crashed #3502

Closed
CyberLine opened this issue Jun 30, 2021 · 2 comments · Fixed by #3529
Closed

ProxySQL Crashed #3502

CyberLine opened this issue Jun 30, 2021 · 2 comments · Fixed by #3529

Comments

@CyberLine
Copy link

CyberLine commented Jun 30, 2021

Last night, one of our proxy instances crashed unexpected.

Attached the Logs.

Installed Version is 2.1.1 from the ProxySQL Debian Repo on Debian 10.8

/var/lib/proxysql/proxysql.log:

2021-06-30 05:38:06 main.cpp:1600:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2021-06-30 05:38:06 [INFO] ProxySQL version 2.1.1-40-g1c2b7e4

GDB output:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff473c5a535 in __GI_abort () at abort.c:79
#2  0x00007ff473c5a40f in __assert_fail_base (fmt=0x7ff473dbcee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55e2374ce404 "mybe", file=0x55e237474b1c "MySQL_Session.cpp", line=4217, function=<optimized out>) at assert.c:92
#3  0x00007ff473c68102 in __GI___assert_fail (assertion=assertion@entry=0x55e2374ce404 "mybe", file=file@entry=0x55e237474b1c "MySQL_Session.cpp", line=line@entry=4217, function=function@entry=0x55e237475810 <MySQL_Session::handler()::__PRETTY_FUNCTION__> "int MySQL_Session::handler()") at assert.c:101
#4  0x000055e236f1d728 in MySQL_Session::handler (this=this@entry=0x7ff46a46e800) at MySQL_Session.cpp:4506
#5  0x000055e236efa5ac in MySQL_Thread::process_all_sessions (this=this@entry=0x7ff46a400000) at MySQL_Thread.cpp:4890
#6  0x000055e236f01d3a in MySQL_Thread::run (this=this@entry=0x7ff46a400000) at MySQL_Thread.cpp:4424
#7  0x000055e236e99cec in mysql_worker_thread_func (arg=0x7ff4732bbeb0) at main.cpp:776
#8  0x00007ff474350fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#9  0x00007ff473d314cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@r0van
Copy link

r0van commented Jul 15, 2021

Hi, try upgrade to latest version (2.2.0), can be related to #3475

@JavierJF
Copy link
Collaborator

JavierJF commented Jul 20, 2021

Hi @CyberLine,

after further investigation on the coredump you provided and the related issues #2460, #3103 and #2898.
We have determined the cause of the issue. I'm going to post the full explanation of it, and
a reproduction guide as context information for the future.

  • ProxySQL version

Reproduced in:

  • v2.1.1 v2.x (bd703d4)

  • OS version

  • Debian 10 (Buster)

  • The steps to reproduce the issue

Original crash reproduction

The issue requires specific configuration and timing conditions for a consistent reproduction. The
infrastructure and scripts required for reproducing it are attached. For a consistent reproduction
it's required:

  1. Launch ProxySQL with the provided configuration with the following command.
proxysql --idle-threads -f -c infra/conf/proxysql/proxysql.cnf -D infra/test_data/

The most important part of the configuration file are the following lines:

# Configure an agressive time for resetting connections, this will increase the likehood of a
# failing PING session being target for a RESET when IDLE sessions are being processed.
connection_max_age_ms=5000

# Configure and aggressive ping interval to increase the likehood of a failing PINGING session
# being reset due to 'max_age_ms'.
ping_interval_server_msec=1000

More aggressive, or other variations of this interval could also be valid values for reproducing
this issue, this are just values which easily induced the issue in a testing system.

  1. Launch the infra configuration script:
./docker-compose-init.bash
  1. Limit ProxySQL CPU usage, so we stall ProxySQL execution, leading to favourable timing conditions
    for the appearance of the issue. cpulimit is good enough for this purpose: cpulimit -p $PROXYSQL_PID -l $val.
    Value will vary from system to system but should be low enough for PINGING_SERVER connections to expire, while
    idle_connections are being processed. The more likely this situation takes place, the easier will
    be to reproduce the issue.

  2. Compile and launch provided reproduction test. The test simply creates perform simple "SELECT 1"
    queries using the annotation create_new_connection for creating IDLE sessions, which will be later
    be used for PING eventually leading to the crash. For using this helper file: repro_helper_3502-t.cpp,
    just drop it in the tap tests folder, and compile it as any other test. Then launch it:

    # Load the constants required for the file
    source constants
    # Launch the helper file specifying the creation of 50 new backend connections (default value).
    ./tests/repro_helper_3502-t
    

Crash details explanation

The crash occurs when the following execution flow takes place:

  1. Function run___get_multiple_idle_connections
    gets called due to last_processing_idles < curtime-mysql_thread___ping_interval_server_msec.
  2. This idle session is used to then ping the backend server, for this it
    set the session status to PINGING_SERVER.
  3. During this session, the myconn->async_ping()
    operation completes immediately, returning '0' and returning the connection to the connection
    poll via return_MySQL_Connection_To_Pool.
  4. The connection that is trying to be returned, is detected as an old session, which has expired
    it's life time imposed by mysql_thread___connection_max_age_ms.
    This leads to the connection trying to be reset, via create_new_session_and_reset_connection,
    which leads to having a new session being pushed into mysql_sessions session stack from Thread
    via register_session_connection_handler.
  5. This session performs as part of RESETTING_CONNECTION an async_change_user.
    This async operation doesn't complete immediately, but it returns 1. The handler
    function for this RESETTING_CONNECTION finish it's execution for now
    returning 0. This leaves this session in the mysql_session list.
    And returns back to frame from point 3 after completing handler_again___status_PINGING_SERVER.
  6. Since the return code by handler is -1.
    The last session pushed to mysql_sessions is deleted. This is, the
    session used for RESSETTING_CONNECTION that is still waiting for it's next
    event. And invalid session have been left in mysql_sessions.
  7. Session receives it's new event, but has been deleted, leading to the assert
    present in all the attached coredumps.

Simple crash reproduction

For easily reproducing this issue, it's enough to:

  1. Introduce a sleep with a delay into MySQL_Thread::run() as specified by the
    supplied patch: thread_busy_wait.patch:

    patch -p0 < thread_busy_wait.patch
    

    This delay is important, as it simulates load in ProxySQL, enough load for
    age the current connections enough that the connection_max_age_ms is going
    to be surpassed for one of them, leading to the scenario previously described.

  2. Compile and start ProxySQL commit bd703d4ea84e768eb1bdb6c6d5e82d59d428ba19 under gdb
    with the supplied configuration for this reproduction:

    source mariadb_breakpoints
    
  3. Set one breakpoint in: mariadb_lib.c:454. This breakpoint allows to make
    async_ping always return 0. Which is the desired behavior for the
    reproduction.

  4. Use the provided test file to open multiple connections to ProxySQL. You need
    to be careful not to get the server SHUNNED due to the placement of the PING
    breakpoint. For using this helper file: repro_helper_3502-t.cpp, just drop
    it in the tap tests folder, and compile it as any other test. Then launch it:

    # Load the constants required for the file
    source constants
    # Launch the helper file specifying the creation of 5 new backend
    # connections, due to the timing being set in other parameters, 5 should be
    # enough to trigger the issue, if not, this value can be increased.
    ./tests/repro_helper_3502-t 5
    
  5. With gdb, continue the execution every time the breakpoint is hit, in one
    of the continue, a segfault or assert should take place. The crash
    situation have been reproduced.

Attached files

  • crashes: Folder holding 2 coredumps, from ProxySQL versions v2.1.1 and
    v2.x (bd703d4ea84e768eb1bdb6c6d5e82d59d428ba19). Together with the error
    log from both of the crashes and the backtrace from gdb for both of them.
  • Infra folder holding all the required infra for configuring and reproducing
    the issue, including the helper script repro_helper_3502-t.cpp, in TAP test
    format.

reproduction.tar.gz

JavierJF added a commit that referenced this issue Jul 20, 2021
renecannao added a commit that referenced this issue Jul 27, 2021
Closes #3502: ProxySQL crashed due to with assert for 'mybe' being null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants