Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain connection low watermark in NetworkClient #1186

Merged
merged 3 commits into from
Jun 20, 2019

Conversation

cgtz
Copy link
Contributor

@cgtz cgtz commented Jun 10, 2019

If prewarming is enabled, maintain a minimum number of connections to
each data node. The goal of this change is to allow the frontend to keep
connection pools adequately sized even when servers are restarted or
other disconnections occur.

To track the connection pools that are below the watermark, some of the
connection establishment logic was moved from NetworkClient to
ConnectionTracker.

Upgrade metrics library from codahale to dropwizard as the library was
renamed and dropwizard has the newest updates.

@cgtz cgtz requested review from zzmao and dharju June 10, 2019 20:16
@codecov-io
Copy link

codecov-io commented Jun 10, 2019

Codecov Report

Merging #1186 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1186   +/-   ##
=========================================
  Coverage     83.62%   83.62%           
  Complexity       57       57           
=========================================
  Files             6        6           
  Lines           348      348           
  Branches         38       38           
=========================================
  Hits            291      291           
  Misses           45       45           
  Partials         12       12

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf53025...e48bddc. Read the comment docs.

@cgtz cgtz force-pushed the conn-healthcheck branch 2 times, most recently from 6fa5b5b to bb6ddb6 Compare June 12, 2019 17:29
Copy link
Contributor

@zzmao zzmao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two minor comments. Need a second round of review.

* @param port the port number.
* @param warmUpPercentage percentage of max connections to this (host, port) that should be kept ready for use.
*/
void enableConnectionWarmUp(String host, Port port, int warmUpPercentage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel more comfortable to call it setLowWatermark , but open to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about setMinimumActiveConnectionsPercentage? lowWatermark isn't very clear terminology to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the codebase to use minActiveConnections instead of lowWatermark

this.poolLimit = poolLimit;
}

boolean hasReachedLowWatermark() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isAboveLowWatermark?

Copy link
Contributor

@dharju dharju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't look thoroughly at tests yet

* Configure the connection tracker to keep a specified percentage of connections to this (host, port) ready for use.
* @param host the hostname.
* @param port the port number.
* @param warmUpPercentage percentage of max connections to this (host, port) that should be kept ready for use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

percentage rounded up or down? I'm guessing down due to the integer math involved at line 71?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rounded down. I will document that

* @param port the port number.
* @param warmUpPercentage percentage of max connections to this (host, port) that should be kept ready for use.
*/
void enableConnectionWarmUp(String host, Port port, int warmUpPercentage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about setMinimumActiveConnectionsPercentage? lowWatermark isn't very clear terminology to me

private static final Logger LOGGER = LoggerFactory.getLogger(ConnectionTracker.class);
private final HashMap<Pair<String, Port>, HostPortPoolManager> hostPortToPoolManager = new HashMap<>();
private final HashMap<String, HostPortPoolManager> connectionIdToPoolManager = new HashMap<>();
private final HashSet<HostPortPoolManager> poolManagersBelowWatermark = new HashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watermark is sort of odd terminology to me, that come from something? I would think something like poolManagersBelowMinimumActiveConnections would make more sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the netty code uses watermark in a few places, but mostly when dealing with buffers, not connections (https://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html). I'm not too tied to the terminology though.

@@ -114,18 +114,8 @@ private String generateConnectionId(SocketChannel channel) {
int localPort = socket.getLocalPort();
String remoteHost = socket.getInetAddress().getHostAddress();
int remotePort = socket.getPort();
long connectionIdSuffix = IdGenerator.getAndIncrement();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't StringBuilder more efficient than concatenating strings in Java? Was this just a code readability vs. performance efficiency tradeoff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true that StringBuilder is more efficient than multiple string concatenations. However, for simple usages of string + (single line, no loops, no method call barriers), java will automatically convert this into a StringBuilder usage. Intellij suggests replacing such usages of StringBuilder with +. Inspecting the bytecode for this line, StringBuilder is in fact used:

    LINENUMBER 118 L6
    NEW java/lang/StringBuilder
    DUP
    INVOKESPECIAL java/lang/StringBuilder.<init> ()V
    ALOAD 3
    INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
    LDC ":"
    INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
    ILOAD 4
    INVOKEVIRTUAL java/lang/StringBuilder.append (I)Ljava/lang/StringBuilder;
    LDC "-"
    INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
    ALOAD 5
    INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
    LDC ":"
    INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
    ILOAD 6
    INVOKEVIRTUAL java/lang/StringBuilder.append (I)Ljava/lang/StringBuilder;
    LDC "_"
    INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
    LLOAD 7
    INVOKEVIRTUAL java/lang/StringBuilder.append (J)Ljava/lang/StringBuilder;
    INVOKEVIRTUAL java/lang/StringBuilder.toString ()Ljava/lang/String;
    ARETURN

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow cool, didn't know that

@@ -138,7 +137,7 @@ public void close() {
socketChannel.close();
} catch (IOException ie) {
metrics.selectorCloseSocketErrorCount.inc();
logger.warn("Failed to send SSL close message ", ie);
Copy link
Contributor

@dharju dharju Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message ends up being pretty noisy in practice. Often SSLTransmission.close() is called when a machine is down and the connection failed, which means that there is no way to actually send the ssl close message.

* @param host the host to which this connection belongs.
* @param port the port on the host to which this connection belongs.
* @param connId the connection id of the connection.
* @param editPoolManagersBelowWatermark true to allow this method to edit {@link #poolManagersBelowWatermark}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever add to poolManagersBelowWatermark when this is true or is it always removing? If the latter perhaps we should call this shouldRemovePoolManagersBelowWatermark, if former shouldEditPoolManagersBelowWatermark (this is actually kind of a weird parameter in general since it's exposing the management of an internal detail (the poolManagersBelowWatermark map) to the user. Can you explain its necessity in the context of its use?)

Copy link
Contributor Author

@cgtz cgtz Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this method only removes from that set.

This is a private helper method meant to allow connectAndTrack and replenishConnections to share code. Set::remove() cannot be called in replenishConnections because it uses an Iterator so I added that flag to allow the two methods to both work correctly.

As an alternative, I can just duplicate the small bit of shared code and get rid of this method to reduce the confusion. The documentation to explain this param probably exceeded the amount of duplicated code, anyways.

@cgtz cgtz force-pushed the conn-healthcheck branch from bb6ddb6 to 207dd58 Compare June 16, 2019 03:43
@cgtz
Copy link
Contributor Author

cgtz commented Jun 16, 2019

@zzmao @dharju addressed review comments

@cgtz cgtz force-pushed the conn-healthcheck branch from 207dd58 to 5d6c5b2 Compare June 16, 2019 04:34
Copy link
Contributor

@zzmao zzmao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

availableCount -= 2;
assertCounts(totalConnectionsCount, availableCount);

// destroy one and return the otheer and then replenish
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: other

AtomicInteger expectedConnectCalls = new AtomicInteger(3);
Runnable checkConnectCalls = () -> Assert.assertEquals(expectedConnectCalls.get(), selector.connectCallCount());

networkClient.warmUpConnections(Collections.singletonList(replicaOnSslNode.getDataNodeId()), 100, 2000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace '100' and '2000' with static final ints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced the poll timeout and warmup timeout with a constant.

The warmup % has a different meaning than poll timeout, so I made it a local variable

newConnections.stream().limit(2).forEach(connectionTracker::removeConnection);
totalConnectionsCount -= 2;
availableCount -= 2;
assertCounts(totalConnectionsCount, availableCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't seem like this is really testing anything after line 239 (I don't see where line 244 affects totalConnectionsCount or availableCount)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

244 takes 2 connections and removes them from the connection pool, thus decreasing the connection count and connections available to check out by 2. This is mostly preparation for the replenishConnections call on line 250, but I wanted to ensure that the connection count was as expected before that call is made.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, assertCounts looks at newConnections. Sorry, I had misread it and thought it was just comparing the two integers

newConnections.forEach(connectionTracker::checkInConnection);
totalConnectionsCount += 2;
availableCount += 2;
assertCounts(totalConnectionsCount, availableCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A connection in connectionTracker must be checked in before it is available for use. Prior to being checked in, the connection is counted as initiated, so it still counts towards the max connection limit, but is not available to be used for making a request yet.

This tests that the connections that replenishConnections can be used as usual

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what's going on now as mentioned above

Copy link
Contributor

@dharju dharju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve after you fix the conflicts

cgtz added 3 commits June 20, 2019 15:35
If prewarming is enabled, maintain a minimum number of connections to
each data node. The goal of this change is to allow the frontend to keep
connection pools adequately sized even when servers are restarted or
other disconnections occur.

To track the connection pools that are below the watermark, some of the
connection establishment logic was moved from NetworkClient to
ConnectionTracker.

Upgrade metrics library from codahale to dropwizard as the library was
renamed and dropwizard has the newest updates.
@cgtz cgtz force-pushed the conn-healthcheck branch from 2d3b0d9 to e48bddc Compare June 20, 2019 22:37
@dharju dharju merged commit 65eef32 into linkedin:master Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants