Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Customer 39) OrientDB Server shutdown doesn't complete in a timely manner #2736

Closed
lloydchang opened this issue Aug 29, 2014 · 27 comments
Closed
Assignees
Milestone

Comments

@lloydchang
Copy link
Contributor

Hi @lvca @enisher @Laa
CC @mattaylor @henryzhao81 @hcmwork @stuartking @pmoorhead

Version: OrientDB 1.7.4
Frequency: 5 incidents across 2 servers in 2 days

Steps to Reproduce:

service orientdb stop

which effectively executes...

cd "$ORIENTDB_DIR/bin"
/usr/bin/nohup ${ORIENTDB_DIR}/bin/shutdown.sh 1>${ORIENTDB_DIR}/log/orientdb.log 2> ${ORIENTDB_DIR}/log/orientdb.err

INCIDENT 1 happened on Server 1 running OrientDB 1.7.4

+ service orientdb stop
Stopping OrientDB server daemon.. (running as root)
tail: /opt/orient/log/orientdb.err: file truncated
Aug 27, 2014 1:24:55 AM com.orientechnologies.common.log.OLogManager log
INFO: Loading configuration from: /opt/orient/config/orientdb-server-config.xml...
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
...
2014-08-27 01:24:56:003 WARN Reached maximum number of concurrent connections (1000), reject incoming connection from /10.224.176.11:36168 [OServerNetworkListener]
2014-08-27 01:25:00:910 WARN Low free heap memory (11%): reducing cached records number from 4 to 3 [ODefaultCache$OLowMemoryListener]
...

INCIDENT 2 happened on Server 1 running OrientDB 1.7.4

+ service orientdb stop
Stopping OrientDB server daemon.. (running as root)
tail: /opt/orient/log/orientdb.err: file truncated
Aug 27, 2014 7:02:54 AM com.orientechnologies.common.log.OLogManager log
...
2014-08-27 07:02:55:485 INFO - storage: pharos... [Orient]
2014-08-27 07:02:55:488 WARN Repairing security structures... [OSecurityShared]
2014-08-27 07:02:55:488 WARN Repair completed [OSecurityShared]Please wait 5 seconds...
Please wait 5 seconds...
Please wait 5 seconds...
Please wait 5 seconds...
Please wait 5 seconds...
Please wait 5 seconds...
Please wait 5 seconds...
Please wait 5 seconds...

2014-08-27 07:03:34:551 SEVE    at java.lang.Thread.getStackTrace(Thread.java:1589)
    at com.orientechnologies.common.concur.resource.OSharedResourceAdaptive.acquireExclusiveLock(OSharedResourceAdaptive.java:119)
    at com.orientechnologies.common.concur.resource.OSharedResourceAdaptiveExternal.acquireExclusiveLock(OSharedResourceAdaptiveExternal.java:31)
    at com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage.doClose(OLocalPaginatedStorage.java:1931)
    at com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage.close(OLocalPaginatedStorage.java:340)
    at com.orientechnologies.orient.core.storage.OStorageAbstract.close(OStorageAbstract.java:106)
    at com.orientechnologies.orient.core.db.raw.ODatabaseRaw.close(ODatabaseRaw.java:516)
    at com.orientechnologies.orient.core.db.ODatabaseWrapperAbstract.close(ODatabaseWrapperAbstract.java:81)
...

INCIDENT 3 happened on Server 1 running OrientDB 1.7.4

+ service orientdb stop
Stopping OrientDB server daemon.. (running as root)
tail: /opt/orient/log/orientdb.err: file truncated
Aug 27, 2014 9:20:06 AM com.orientechnologies.common.log.OLogManager log
INFO: Loading configuration from: /opt/orient/config/orientdb-server-config.xml...
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
...

INCIDENT 4 happened on Server 1 running OrientDB 1.7.4

+ service orientdb stop
Stopping OrientDB server daemon.. (running as root)
tail: /opt/orient/log/orientdb.err: file truncated
Aug 27, 2014 9:33:11 AM com.orientechnologies.common.log.OLogManager log
INFO: Loading configuration from: /opt/orient/config/orientdb-server-config.xml...
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
    at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
    at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
...

INCIDENT 5 happened on Server 2 running OrientDB 1.7.4

+ service orientdb stop
Stopping OrientDB server daemon.. (running as root)
Aug 28, 2014 6:56:22 PM com.orientechnologies.common.log.OLogManager log
INFO: Loading configuration from: /opt/orient/config/orientdb-server-config.xml...
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:77)
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57)
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32)
        at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103)
        at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)
...
@lloydchang
Copy link
Contributor Author

For what it's worth, our OrientDB servers are running:

Java Version:

java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

OS: Linux
Distro Version: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Kernel Version: 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

and here's our /etc/init.d/orientdb file's stop() function that

  1. cd "$ORIENTDB_DIR/bin"
  2. /usr/bin/nohup ${ORIENTDB_DIR}/bin/shutdown.sh 1>${ORIENTDB_DIR}/log/orientdb.log 2> ${ORIENTDB_DIR}/log/orientdb.err
  3. Prints out "Please wait 5 seconds..."
  4. While waiting for "OrientDB Server shutdown complete" to be logged to ${ORIENTDB_DIR}/log/orientdb.err
stop() {
    status
    if [ $PID -eq 0 ]
    then
        echo "OrientDB server daemon is already not running"
        return 0
    fi
    tail --pid=$$ -n0 -F ${ORIENTDB_DIR}/log/orientdb.err &
    echo "Stopping OrientDB server daemon.. (running as `id -u -n`)"
    cd "$ORIENTDB_DIR/bin"
    /usr/bin/nohup ${ORIENTDB_DIR}/bin/shutdown.sh 1>${ORIENTDB_DIR}/log/orientdb.log 2> ${ORIENTDB_DIR}/log/orientdb.err
    shutdownStatus=false
    while [ "$shutdownStatus" != "0" ]; do
        echo "Please wait 5 seconds..."
        sleep 5
        grep 'OrientDB Server shutdown complete' ${ORIENTDB_DIR}/log/orient-server.log.0
        shutdownStatus=$?
        if [ $shutdownStatus != "0" ]; then
            status
            if [ $PID -eq 0 ]
            then
                echo "Error during OrientDB Server shutdown"
                echo ""
                shutdownStatus=0
            fi
        fi
    done
}

@andrii0lomakin
Copy link
Member

Hi this fixed in version 1.7.7 , could you try on latest version and then close issue if it is not reproduced.

@lloydchang
Copy link
Contributor Author

Hi @iaa Which git commit within ODB 1.7.7 fixes this bug? Please let us know; we want to review it; thanks.

@lvca
Copy link
Member

lvca commented Aug 29, 2014

@lloydchang We suggest you to stay updated with hotfixes we release. You can find the changelog in GitHub of each release.

@lloydchang
Copy link
Contributor Author

Hi @lvca @Laa @enisher @maggiolo00 @luigidellaquila @tglman
CC @mattaylor @henryzhao81 @hcmwork @stuartking @pmoorhead

We are still experiencing this issue after trying a newer ODB version.

Hence as git committers for ODB 1.7.7, perhaps you can help us? Thank you for your time!

We skimmed git commits at https://github.com/orientechnologies/orientdb/commits/1.7.7 and we cannot identify the ODB 1.7.7 git commit that fixes this bug.

Request: Would you be able to identify which git commit within ODB 1.7.7 fixes this bug?

@andrii0lomakin
Copy link
Member

Hi, seems you are right. It is different issue.
Do you have Skype/Hangout to contact you ?

@lloydchang
Copy link
Contributor Author

Hi @lvca @Laa @enisher
CC @mattaylor @henryzhao81 @hcmwork @stuartking @pmoorhead

Summary: This, #2736, is a new issue — It is a different shutdown issue than #2239 and #2449

Trying steps @lvca suggested, this issue is still reproducible 👎

@andrii0lomakin
Copy link
Member

@lloydchang do you have Skype/Hangout to contact you ? ))

@lloydchang
Copy link
Contributor Author

@Laa Thank you for the response. What is your Skype and/or Google Hangout name? I asked @henryzhao81 and @hcmwork to Live Skype and/or Google Hangout with you, and execute a technical deep dive and code review, at line-by-line source code level. I recall that you and @henryzhao81 executed 1:1 technical conversations previously about OrientDB.

@andrii0lomakin
Copy link
Member

Hi my skype is lomakin_andrey

@hcmwork
Copy link

hcmwork commented Aug 29, 2014

Hi Andrey: my skype id is hemalcm

I have sent you a contact request.

@lloydchang
Copy link
Contributor Author

Hi @Laa
Cc @henryzhao81 @hcmwork @mattaylor @lvca @chrishuttonch

My skype ID is lchang.proteusdh -- I have sent you a contact request as well.

Regarding incident 1 described, I see the following warning:

2014-08-27 01:24:56:003 WARN Reached maximum number of concurrent connections (1000),
reject incoming connection from /10.224.176.11:36168 [OServerNetworkListener]

Question 1:

  • Is the 1000 concurrent connections limit being hard-coded?

I found "1000" hard-coded(?) in multiple OrientDB versions' branch source code; here's the source code from OrientDB 1.7.9 https://github.com/orientechnologies/orientdb/blob/1.7.9/core/src/main/java/com/orientechnologies/orient/core/config/OGlobalConfiguration.java#L415

// NETWORK
NETWORK_MAX_CONCURRENT_SESSIONS("network.maxConcurrentSessions",
"Maximum number of concurrent sessions", Integer.class, 1000),

and the warning message
https://github.com/orientechnologies/orientdb/blob/1.7.9/server/src/main/java/com/orientechnologies/orient/server/network/OServerNetworkListener.java#L198

// MAXIMUM OF CONNECTIONS EXCEEDED
            OLogManager.instance().warn(this,
                "Reached maximum number of concurrent connections (%d),
                reject incoming connection from %s", conns,
                socket.getRemoteSocketAddress());

Question 2:

Chris @chrishuttonch wrote, "I have ensured that I close my connection for every one opened but still find that some threads aren't closed. This causes the OrientDB server to eventually hit the 1000 open connections limit (WARN Reached maximum number of concurrent connections (1000), reject incoming connection from /IP:60350 [OServerNetworkListener])."

Luca @lvca replied, "As work around use a connection pool: it's faster and don't leaves pending connections."

From https://github.com/orientechnologies/orientdb/wiki/Performance-Tuning#Network_Connection_Pool

"The configurations is very simple, just 2 parameters:

  • minPool, is the initial size of the connection pool. The default value is configured as global parameters "client.channel.minPool" (see parameters)
  • maxPool, is the maximum size the connection pool can reach. The default value is configured as global parameters "client.channel.maxPool" (see parameters)"

Comparatively, our server configuration sets following values:

        <entry name="client.channel.minPool" value="50" />
        <entry name="client.channel.maxPool" value="5000" />

Therefore, I don't understand where the 1000 is coming from.

Questions 3, 4, 5, and 6:

  • Do you have precise instructions to use a connection pool as @lvca suggested?
  • If we want to performance test from 0 to 5,000 concurrent connections, what configuration parameters and values should we be setting?
  • Shouldn't the maximum be at 5000 per client.channel.maxPool configuration above?
  • While a connection pool doesn't leave pending connections, does creating a connection pool help with concurrent connections being made, live in real time?

After searching through logs on server 1, I found two incidents of "concurrent connections (1000)" warning message ... that happened during shutdown / service orientdb stop attempts on one of our server running single server mode:

2014-08-26 17:41:35:915 WARN Reached maximum number of concurrent connections (1000),
reject incoming connection from /10.224.176.11:34108 [OServerNetworkListener]

and it happened a second time:

2014-08-27 01:24:56:003 WARN Reached maximum number of concurrent connections (1000),
reject incoming connection from /10.224.176.11:36168 [OServerNetworkListener]

Reading through the log entry and source code linked above, I think the 10.224.176.11 originates from OServerShutdownMain.java#L75 networkAddress = l.ipAddress; https://github.com/orientechnologies/orientdb/blob/1.7.9/server/src/main/java/com/orientechnologies/orient/server/OServerShutdownMain.java#L75

And OServerShutdownMain requires a remote connection from localhost 127.0.0.1 to 10.224.176.11 (the server's 10.x.x.x IP address) to be accessible ... However, if the server is already at 1000 concurrent connections (even though client.channel.maxPool = 5000), then shutdown will not complete in a timely manner, and the warning message will appear, followed by "WARN Low free heap memory" messages.

Questions 7 and 8:

  • Is network.maxConcurrentSessions configurable?
  • Do we need to change our configuration to set network.maxConcurrentSessions = 5000, to increase it from 1000 to 5000?

Looking forward to your thoughts; thanks.

@lvca
Copy link
Member

lvca commented Sep 1, 2014

@lloydchang My answers to your Questions:

  • 1: it's not hardcoded if it's configurable. Default value is 1k, but can be changed as usual, via Java API or by setting -Dnetwork.maxConcurrentSessions=X at startup
  • 2: Yes
  • 3, 4, 5 and 6: Change that setting in server.sh or orientdb-*server-config.xml to 5k. Set also db.pool.max to 5k. In this way the server is able to open and serve 5k real concurrent client connections
  • 7: All the settings under OGlobalConfiguration are customizable, that is the meaning. For more information look at: http://www.orientechnologies.com/docs/last/orientdb.wiki/Configuration.html#change-settings
  • 8: Yes (see above)

@lvca lvca closed this as completed Sep 1, 2014
@lvca lvca added question and removed bug labels Sep 1, 2014
@lvca lvca self-assigned this Sep 1, 2014
@lloydchang
Copy link
Contributor Author

Hi @lvca @Laa Thank you; status update about resolution:
CC @mattaylor @henryzhao81 @hcmwork @namdevm @gauravchouhan @vgaurihar @stuartking @pmoorhead

  • I tried setting db.pool.max and client.channel.maxPool only -- They seem ineffective during HTTP REST API real concurrent connections to OrientDB TCP 2480, hence I switched REST API clients from HTTP 1.0 non-persistent to HTTP 1.1 persistent connections
  • I configured OrientDB to listen on 127.0.0.1:2423 -- reserving this network route for shutdown -- in case 0.0.0.0:2424 to 0.0.0.0:2430 are inaccessible (e.g. blocked via iptables firewall).
  • While 127.0.0.1:2423 solves the firewall-only scenario, it doesn't solve additional scenario of WARN Reached maximum number of concurrent connections (1000), reject incoming connection from /127.0.0.1:2423 [OServerNetworkListener].
  • To resolve both scenarios: In addition to 127.0.0.1:2423 listener reserved for shutdown, I also configured OrientDB's server.sh Java network.maxConcurrentSessions (real concurrent connections) to equal cat /proc/sys/fs/file-max, maximum number of file handles that Linux kernel will allocate: Each real concurrent connection uses TCP socket and Linux file descriptor, consuming Linux kernel memory…

References
1. Linux kernel 2.x sysctl fs-max (https://www.kernel.org/doc/Documentation/sysctl/fs.txt)
2. Is there a hard limit of 65536 open TCP connections per IP address on Linux? (http://superuser.com/questions/251596/is-there-a-hard-limit-of-65536-open-tcp-connections-per-ip-address-on-linux)
3. How much memory is consumed by the Linux kernel per TCP/IP network connection? (http://stackoverflow.com/questions/8646190/how-much-memory-is-consumed-by-the-linux-kernel-per-tcp-ip-network-connection)
4. Linux Network Tuning for 2013 (http://www.nateware.com/linux-network-tuning-for-2013.html)
5. Practical maximum open file descriptors (ulimit -n) for a high volume system (http://serverfault.com/questions/48717/practical-maximum-open-file-descriptors-ulimit-n-for-a-high-volume-system)

@andrii0lomakin
Copy link
Member

Hi,
Impressive ! Thank you for your update.

On Mon, Oct 6, 2014 at 8:15 AM, lloydchang notifications@github.com wrote:

Hi @lvca https://github.com/lvca @Laa https://github.com/laa Thank
you; status update about resolution: CC @mattaylor
https://github.com/mattaylor @henryzhao81
https://github.com/henryzhao81 @hcmwork https://github.com/hcmwork
@namdevm https://github.com/namdevm @gauravchouhan
https://github.com/gauravchouhan @vgaurihar
https://github.com/vgaurihar @stuartking https://github.com/stuartking
@pmoorhead https://github.com/pmoorhead

  • I tried setting db.pool.max and client.channel.maxPool only -- They
    seem ineffective during HTTP REST API real concurrent connections to
    OrientDB TCP 2480, hence I switched REST API clients from HTTP 1.0
    non-persistent to HTTP 1.1 persistent connections
  • I configured OrientDB to listen on 127.0.0.1:2423 -- reserving this
    network route for shutdown -- in case 0.0.0.0:2424 to 0.0.0.0:2430 are
    inaccessible (e.g. blocked via iptables firewall).
  • While 127.0.0.1:2423 solves the firewall-only scenario, it doesn't
    solve additional scenario of WARN Reached maximum number of concurrent
    connections (1000), reject incoming connection from /127.0.0.1:2423
    [OServerNetworkListener].
  • To resolve both scenarios: In addition to 127.0.0.1:2423
    http://127.0.0.1:2423 listener reserved for shutdown, I also configured
    OrientDB's server.sh Java network.maxConcurrentSessions (real concurrent
    connections) to equal cat /proc/sys/fs/file-max, maximum number of file
    handles that Linux kernel will allocate: Each real concurrent connection
    uses TCP socket and Linux file descriptor, consuming Linux kernel memory…

References
1. Linux kernel 2.x sysctl fs-max (
https://www.kernel.org/doc/Documentation/sysctl/fs.txt)
2. Is there a hard limit of 65536 open TCP connections per IP address on
Linux?
(
http://superuser.com/questions/251596/is-there-a-hard-limit-of-65536-open-tcp-connections-per-ip-address-on-linux
)
3. How much memory is consumed by the Linux kernel per TCP/IP network
connection?
(
http://stackoverflow.com/questions/8646190/how-much-memory-is-consumed-by-the-linux-kernel-per-tcp-ip-network-connection
)
4. Linux Network Tuning for 2013 (
http://www.nateware.com/linux-network-tuning-for-2013.html)
5. Practical maximum open file descriptors (ulimit -n) for a high volume
system
(
http://serverfault.com/questions/48717/practical-maximum-open-file-descriptors-ulimit-n-for-a-high-volume-system)**


Reply to this email directly or view it on GitHub
#2736 (comment)
.

Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

@lvca
Copy link
Member

lvca commented Oct 6, 2014

@tglman has an almost ready fix for this problem other clients experimented. @tglman did you already commit it in 1.7.10 and 2.0?

@lvca
Copy link
Member

lvca commented Oct 6, 2014

The fix is on releasing of connections

@lloydchang
Copy link
Contributor Author

@lvca Please reopen this #2736 as bug until @tglman git commits are in 1.7.10 and 2.0?
@tglman Please provide links to diffs of your code changes (links to git commits here on GitHub)? Thanks!

@tglman
Copy link
Member

tglman commented Oct 6, 2014

hi @lloydchang
for 1.7.10 the diff is here ece3741

for 2.0 is here 0609439

@lvca lvca added this to the 1.7.10 milestone Oct 6, 2014
@lloydchang
Copy link
Contributor Author

Hi @tglman @lvca @Laa
CC @mattaylor @henryzhao81 @hcmwork @namdevm @gauravchouhan @vgaurihar @stuartking @pmoorhead

Today, we observed following log snippet Cannot read protocol version; we had to kill -9 OrientDB to shutdown in a timely manner, albeit an incomplete shutdown. My interpretation of ece3741 and 0609439 indicate they resolve a different scenario (Nor releasing of connections).

Therefore, would you consider re-opening this issue #2736 until following scenario is resolved? I don't know how to reproduce it, but I want to bring this to your attention:

Oct 06, 2014 9:27:51 AM com.orientechnologies.common.log.OLogManager log INFO: Loading configuration from: /opt/orient/config/orientdb-server-config.xml... com.orientechnologies.orient.enterprise.channel.binary.ONetworkProtocolException: Cannot read protocol version from remote server /127.0.0.1:2423: java.net.SocketException: Connection reset at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:92) at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.<init>(OChannelBinaryAsynchClient.java:57) at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClientSynch.<init>(OChannelBinaryAsynchClientSynch.java:32) at com.orientechnologies.orient.server.OServerShutdownMain.connect(OServerShutdownMain.java:103) at com.orientechnologies.orient.server.OServerShutdownMain.main(OServerShutdownMain.java:135)

@lloydchang
Copy link
Contributor Author

Hi @tglman @lvca @Laa @enisher
CC @mattaylor @henryzhao81 @hcmwork @namdevm @gauravchouhan @vgaurihar @stuartking @pmoorhead
Is above snippet today Cannot read protocol version possibly a symptom of following sequence:

  1. Recurrence of Shutdown doesn't wait for the full closing of databases #2239 (comment) supposedly fixed in 1.7.3, then fixed again six days ago ("We fixed the same issue more time, as soon as users reported the problem on shutdown.")
    … simultaneously …
  2. Concurrent realtime attempt to shutdown the same OrientDB instance via 127.0.0.1:2423 listener, while initial shutdown attempt was still somehow stalled on 127.0.0.1:2423 listener, and not responding properly?
    We would like clarification about shutdown symptoms seen in 1.7.6 through 1.7.10, plus 2.0 and beyond; above snippet happened today on an OrientDB 1.7.6 instance with a large database.

@publicocean0
Copy link

hi i found a problem with shutdown code introduced in orientdb. Surely it can create a lot of problems in a orientdb cllient hosting in more open system but also in server.
The shutdown in orientdb are not handled directly using java shutdown hook but with SignalHandler.
It gives more possibilities but gives also a side effect. The handlers is executed not following the right order in closing VM.
the right order is .... before VM call all hooks installed, deregistring other handlers, close classes destroying data before to exit ... and then the database connections are closed.
The connections to database might be closed in the ending , not in the beginning of shutdown.
Now it is possible the connections are closed before the VM is completing to destroy data in the VM.

@lvca
Copy link
Member

lvca commented Aug 17, 2015

We used signals to catch any possible case to avoid to break the DB. This works well with server and usually clients. Are you using an App Server?

@publicocean0
Copy link

yes i m using orientdb in a server that is making a specific task (but it is a classical web container with a war ... i m using tomcat now.
).... all is ok but when the server is shutdown for any reason, it might save the status of specific running data, so the status can be realoaded correctly at startup.
I added a hook ... but the first problem is i cant know in what order is executed ... so i solved it before patching java hooks handler ... now hooks is executed folling a priorityqueue. So it is executed before to close other classes (as hanzelcast for example). Solved it remains the problem with orientdb ... when it try to save data it find the connection close anyway because signal handler dont follow hooks.
Pratically the connections are closed before to all... before any hooks, but it might be closed in the ending ... granting the all the logic related a shutdown is executed correctly.
I thought to modify OConnectionManager ... instead to command directly the shutdown when it received the signal ... it add a hook in the end of queue so orientdb shutdown will be executed at the end.

There is also a strange loop when you close application not completely started . The loop show continuely this message :

com.orientechnologies.common.log.OLogManager.log Removing disconnected network channel '127.0.0.1:2424/AAA'...

@dcarr178
Copy link

dcarr178 commented Sep 2, 2015

in orientdb version 2.1.1 the file bin/shutdown.sh does not work. The only thing that I can get to work is kill .

@lvca
Copy link
Member

lvca commented Sep 4, 2015

@dcarr178 Please could you open a new issue for that?

@dcarr178
Copy link

dcarr178 commented Sep 4, 2015

@lvca I dropped my db and started over which seemed to correct the problem so I cannot reproduce anymore. Will create a new issue if I can reproduce again. The weirdest part was the log file entry that said Error:null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants