Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU usage using clusters and mariaSQL, when cluster is exit and restarted #149

Open
AbijeetP opened this issue Jun 9, 2016 · 8 comments

Comments

@AbijeetP
Copy link

AbijeetP commented Jun 9, 2016

Hi @mscdex,

Sorry for the long post but I'm trying to give you as much information as I possibly can. I have done some snooping around, so I'm including everything that I've discovered.

On my Node.JS app I'm using clusters to utilize my multi-core CPU. I'm using the node's mariasql library to communicate with my database. Since the node-mariasql library does not support pooling, we are using the third party - generic-pool to maintain a pool of connections.

We've noticed that our CPU usage goes upto 100% whenever a connection in the master thread is closed after an uncaught exception causes one of the child clusters to restart.

Whenever a child cluster restarts, we are destroying all MySQL connections.

Node version - v4.2.2
MariaDB version - v10.0.15
node-mariasql version - v0.2.5

Reproducible code - https://github.com/bsurendrakumar/node-simplex/


The minimum number of connections in the pool is set to 5. All its configuration can be found under here. So when the server starts, generic pool will kick of 5 connections to MySQL and keep them in its pool.

The idleTimeout for an object in the pool has been set to 120 seconds. This means that if there are more than 5 (since 5 is the minimum) objects in the pool and one of them has not been used for the last 120 seconds, it'll be destroyed.

At server startup, we're making a simple call to our country model to fetch the list of countries. This code is here. This establishes a new connection to the database, so now in the pool there'll be a 6 SQL connection in the pool and one of which will get cleaned after 120 seconds.

Following is the step by step process via which, we believe that the issue is with our usage of the mariasql library -

  • When the server is started, we are logging the process ID to the console. Grab the mater process ID, for example - 20584.
  • Look at the file descriptors being used by the process by using - ls -l /proc/20584/fd. Make a note of the socket connections. The output of this will look something like this -
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 12 -> socket:[2469914]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 13 -> socket:[2469917]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 14 -> socket:[2468106]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 15 -> socket:[2468109]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 17 -> socket:[2467206]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 18 -> socket:[2467208]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 19 -> socket:[2467210]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 2 -> /dev/tty
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 20 -> socket:[2467212]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 21 -> socket:[2467214]
lrwx------ 1 abijeet abijeet 64 Jun  9 19:24 22 -> socket:[2467306]
  • Copy few of the sockets numbers for example 2467212, and run lsof | grep 2467212. You'll notice that these are connections to the MySQL server. The output of that should be something like -
node      20584           abijeet   20u     IPv4            2467212       0t0        TCP localhost:57092->localhost:mysql (ESTABLISHED)
V8        20584 20585     abijeet   20u     IPv4            2467212       0t0        TCP localhost:57092->localhost:mysql (ESTABLISHED)
V8        20584 20586     abijeet   20u     IPv4            2467212       0t0        TCP localhost:57092->localhost:mysql (ESTABLISHED)
V8        20584 20587     abijeet   20u     IPv4            2467212       0t0        TCP localhost:57092->localhost:mysql (ESTABLISHED)
V8        20584 20588     abijeet   20u     IPv4            2467212       0t0        TCP localhost:57092->localhost:mysql (ESTABLISHED)
  • Crash the server by going to http://127.0.0.1:3002/api/v1/country/list. This will crash one of the child processes. Whenever an uncaught exception occurs, we do some cleanup and exit. We then fork another process to take the place of the one that was just killed. Cleanup includes -
    • Closing http server
    • Closing MySQL connections in generic pool
    • Closing winston logger streams.
  • Wait for the MySQL connection in the master thread to be closed. When this happes, we are writing a log to the console -
Destroying / ending master thread ID -  4984
  • Check your CPU usage, you'll notice that one of the CPU's has shot upto 100%.
  • Next run, strace -o log.txt -eepoll_ctl,epoll_wait -p 20584. Note that you might need to install strace. This command logs all the epoll_ctl, epoll_wait system calls made by the Node.JS process and puts it inside a file named log.txt the current working directory.
  • Open the log.txt file, and you'll notice logs similar to these -
epoll_wait(5, {{EPOLLIN|EPOLLHUP, {u32=16, u64=16}}}, 1024, 847) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 16, 7ffe441aa850) = -1 EBADF (Bad file descriptor)
epoll_wait(5, {{EPOLLIN|EPOLLHUP, {u32=16, u64=16}}}, 1024, 845) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 16, 7ffe441aa850) = -1 EBADF (Bad file descriptor)
epoll_wait(5, {{EPOLLIN|EPOLLHUP, {u32=16, u64=16}}}, 1024, 843) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 16, 7ffe441aa850) = -1 EBADF (Bad file descriptor)
  • The file descriptor here is 16, and if you co-relate it with your earlier ls -l /proc/20584/fd and lsof | grep 2467212, you'll realize that this belongs to the MySQL connection that was just closed.

This leads us to believe that somewhere, even when the connection to MySQL is released, there is a file descriptor hanging there, that is still being used. We've found various threads across forums -

@AbijeetP
Copy link
Author

AbijeetP commented Jun 14, 2016

@mscdex - I've update the reproducible sample to have all the code in a single server.js file. It should now be much easier to trace and debug. Sorry I should have done that earlier.

The repository is still the same - https://github.com/bsurendrakumar/node-simplex/

@sindhushaballa
Copy link

100% CPU usage issue is not happening with the node-mysql module.
Updated the code using this module.

Please find this code here.

@mscdex
Copy link
Owner

mscdex commented Jun 21, 2016

Sorry all, I haven't had time to look into this just yet.

@sindhushaballa
Copy link

Hi @mscdex,

Did you get a chance to look into this issue?

@mscdex
Copy link
Owner

mscdex commented Jun 28, 2016

Can you verify that this still happens with mariasql v0.2.6?

@sindhushaballa
Copy link

Hi @mscdex

I updated the version to 0.2.6, still I'm seeing 100% CPU usage.

Thanks
Surendra Kumar B

@sindhushaballa
Copy link

Hi @mscdex

Still, we are facing this issue with 0.2.6.

Could you please look into this issue when you have some time. Sorry for troubling you for couple of times from long back.

Thanks
Surendra Kumar B

@AbijeetP
Copy link
Author

@mscdex - Did you get a chance to look at this? Do you need anything else from us on this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants