Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memcached cache: switch to AWS elasticache-java-cluster-client and add TLS support #14827

Merged
merged 21 commits into from
Oct 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2112,16 +2112,19 @@ In addition to the normal cache metrics, the caffeine cache implementation also

Uses memcached as cache backend. This allows all processes to share the same cache.

|Property|Description|Default|
|--------|-----------|-------|
|`druid.cache.expiration`|Memcached [expiration time](https://code.google.com/p/memcached/wiki/NewCommands#Standard_Protocol).|2592000 (30 days)|
|`druid.cache.timeout`|Maximum time in milliseconds to wait for a response from Memcached.|500|
|`druid.cache.hosts`|Comma separated list of Memcached hosts `<host:port>`.|none|
|`druid.cache.maxObjectSize`|Maximum object size in bytes for a Memcached object.|52428800 (50 MiB)|
|`druid.cache.memcachedPrefix`|Key prefix for all keys in Memcached.|druid|
|`druid.cache.numConnections`|Number of memcached connections to use.|1|
|`druid.cache.protocol`|Memcached communication protocol. Can be binary or text.|binary|
|`druid.cache.locator`|Memcached locator. Can be consistent or array_mod.|consistent|
| Property | Description | Default |
|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|
| `druid.cache.expiration` | Memcached [expiration time](https://code.google.com/p/memcached/wiki/NewCommands#Standard_Protocol). | 2592000 (30 days) |
| `druid.cache.timeout` | Maximum time in milliseconds to wait for a response from Memcached. | 500 |
| `druid.cache.hosts` | Comma separated list of Memcached hosts `<host:port>`. Need to specify all nodes when `druid.cache.clientMode` is set to static. Dynamic mode [automatically identifies nodes in your cluster](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.html) so just specifying the configuration endpoint and port is fine. | none |
| `druid.cache.maxObjectSize` | Maximum object size in bytes for a Memcached object. | 52428800 (50 MiB) |
| `druid.cache.memcachedPrefix` | Key prefix for all keys in Memcached. | druid |
| `druid.cache.numConnections` | Number of memcached connections to use. | 1 |
| `druid.cache.protocol` | Memcached communication protocol. Can be binary or text. | binary |
| `druid.cache.locator` | Memcached locator. Can be consistent or array_mod. | consistent |
| `druid.cache.enableTls` | Enable TLS based connection for Memcached client. Boolean | false |
| `druid.cache.clientMode` | Client Mode. Static mode requires the user to specify individual cluster nodes. Dynamic mode uses [AutoDiscovery](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.HowAutoDiscoveryWorks.html) feature of AWS Memcached. String. ["static"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Manual.html) or ["dynamic"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Using.ModifyApp.Java.html) | static |
| `druid.cache.skipTlsHostnameVerification` | Skip TLS Hostname Verification. Boolean. | true |

#### Hybrid

Expand Down
6 changes: 3 additions & 3 deletions licenses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1658,13 +1658,13 @@ libraries:

---

name: Spymemcached
name: aws-elasticache-cluster-client-memcached-for-java
license_category: binary
module: java-core
license_name: Apache License version 2.0
version: 2.12.3
version: 1.2.0
libraries:
- net.spy: spymemcached
- com.amazonaws: elasticache-java-cluster-client

---

Expand Down
6 changes: 3 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -773,9 +773,9 @@
<version>3.3.6</version>
</dependency>
<dependency>
<groupId>net.spy</groupId>
<artifactId>spymemcached</artifactId>
<version>2.12.3</version>
<groupId>com.amazonaws</groupId>
<artifactId>elasticache-java-cluster-client</artifactId>
<version>1.2.0</version>
xvrl marked this conversation as resolved.
Show resolved Hide resolved
</dependency>
<dependency>
<groupId>org.antlr</groupId>
Expand Down
4 changes: 2 additions & 2 deletions server/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,8 @@
<artifactId>tesla-aether</artifactId>
</dependency>
<dependency>
<groupId>net.spy</groupId>
<artifactId>spymemcached</artifactId>
<groupId>com.amazonaws</groupId>
<artifactId>elasticache-java-cluster-client</artifactId>
</dependency>
<dependency>
<groupId>org.lz4</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;
import net.spy.memcached.AddrUtil;
import net.spy.memcached.ClientMode;
import net.spy.memcached.ConnectionFactory;
import net.spy.memcached.ConnectionFactoryBuilder;
import net.spy.memcached.FailureMode;
Expand All @@ -52,10 +53,16 @@
import org.apache.druid.java.util.metrics.AbstractMonitor;

import javax.annotation.Nullable;
import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManagerFactory;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.security.KeyManagementException;
import java.security.KeyStore;
import java.security.KeyStoreException;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
Expand Down Expand Up @@ -339,25 +346,8 @@ public void updateHistogram(String name, int amount)
}
};

final ConnectionFactory connectionFactory = new MemcachedCustomConnectionFactoryBuilder()
// 1000 repetitions gives us good distribution with murmur3_128
// (approx < 5% difference in counts across nodes, with 5 cache nodes)
.setKetamaNodeRepetitions(1000)
.setHashAlg(MURMUR3_128)
.setProtocol(ConnectionFactoryBuilder.Protocol.valueOf(StringUtils.toUpperCase(config.getProtocol())))
.setLocatorType(ConnectionFactoryBuilder.Locator.valueOf(StringUtils.toUpperCase(config.getLocator())))
.setDaemon(true)
.setFailureMode(FailureMode.Cancel)
.setTranscoder(transcoder)
.setShouldOptimize(true)
.setOpQueueMaxBlockTime(config.getTimeout())
.setOpTimeout(config.getTimeout())
.setReadBufferSize(config.getReadBufferSize())
.setOpQueueFactory(opQueueFactory)
.setMetricCollector(metricCollector)
.setEnableMetrics(MetricType.DEBUG) // Not as scary as it sounds
.build();

final ConnectionFactory connectionFactory = createConnectionFactory(config, transcoder,
opQueueFactory, metricCollector);
final List<InetSocketAddress> hosts = AddrUtil.getAddresses(config.getHosts());


Expand Down Expand Up @@ -389,11 +379,57 @@ public MemcachedClientIF get()

return new MemcachedCache(clientSupplier, config, monitor);
}
catch (IOException e) {
catch (IOException | NoSuchAlgorithmException e) {
throw new RuntimeException(e);
}
catch (KeyStoreException e) {
throw new RuntimeException(e);
}
catch (KeyManagementException e) {
throw new RuntimeException(e);
}
}

public static ConnectionFactory createConnectionFactory(final MemcachedCacheConfig config, final LZ4Transcoder transcoder, final OperationQueueFactory opQueueFactory, final MetricCollector metricCollector) throws KeyManagementException, KeyStoreException, NoSuchAlgorithmException
{
MemcachedCustomConnectionFactoryBuilder connectionFactoryBuilder = (MemcachedCustomConnectionFactoryBuilder) new MemcachedCustomConnectionFactoryBuilder()
// 1000 repetitions gives us good distribution with murmur3_128
// (approx < 5% difference in counts across nodes, with 5 cache nodes)
.setKetamaNodeRepetitions(1000)
.setHashAlg(MURMUR3_128)
.setProtocol(ConnectionFactoryBuilder.Protocol.valueOf(StringUtils.toUpperCase(config.getProtocol())))
.setLocatorType(ConnectionFactoryBuilder.Locator.valueOf(StringUtils.toUpperCase(config.getLocator())))
.setDaemon(true)
.setFailureMode(FailureMode.Cancel)
.setTranscoder(transcoder)
.setShouldOptimize(true)
.setOpQueueMaxBlockTime(config.getTimeout())
.setOpTimeout(config.getTimeout())
.setReadBufferSize(config.getReadBufferSize())
.setOpQueueFactory(opQueueFactory)
.setMetricCollector(metricCollector)
.setEnableMetrics(MetricType.DEBUG); // Not as scary as it sounds
if (config.enableTls()) {
// Build SSLContext
TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
tmf.init((KeyStore) null);
SSLContext sslContext = SSLContext.getInstance("TLS");
sslContext.init(null, tmf.getTrustManagers(), null);
// Create the client in TLS mode
connectionFactoryBuilder.setSSLContext(sslContext);
}
if ("dynamic".equals(config.getClientMode())) {
connectionFactoryBuilder.setClientMode(ClientMode.Dynamic);
connectionFactoryBuilder.setHostnameForTlsVerification(config.getHosts().split(",")[0]);
} else if ("static".equals(config.getClientMode())) {
connectionFactoryBuilder.setClientMode(ClientMode.Static);
} else {
throw new RuntimeException("Invalid value provided for `druid.cache.clientMode`. Value must be 'static' or 'dynamic'.");
}
connectionFactoryBuilder.setSkipTlsHostnameVerification(config.skipTlsHostnameVerification());
return connectionFactoryBuilder.build();
}

private final int timeout;
private final int expiration;
private final String memcachedPrefix;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,15 @@ public class MemcachedCacheConfig
@JsonProperty
private String locator = "consistent";

@JsonProperty
private boolean enableTls = false;

@JsonProperty
private String clientMode = "static";

@JsonProperty
private boolean skipTlsHostnameVerification = true;

public int getExpiration()
{
return expiration;
Expand Down Expand Up @@ -112,4 +121,19 @@ public String getLocator()
{
return locator;
}

public boolean enableTls()
{
return enableTls;
}

public String getClientMode()
{
return clientMode;
}

public boolean skipTlsHostnameVerification()
{
return skipTlsHostnameVerification;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
package org.apache.druid.client.cache;

import net.spy.memcached.ArrayModNodeLocator;
import net.spy.memcached.ClientMode;
import net.spy.memcached.ConnectionFactory;
import net.spy.memcached.ConnectionFactoryBuilder;
import net.spy.memcached.ConnectionObserver;
Expand All @@ -37,6 +38,7 @@
import net.spy.memcached.transcoders.Transcoder;
import net.spy.memcached.util.DefaultKetamaNodeLocatorConfiguration;

import javax.net.ssl.SSLContext;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.BlockingQueue;
Expand All @@ -56,7 +58,7 @@ public MemcachedCustomConnectionFactoryBuilder setKetamaNodeRepetitions(int repe
@Override
public ConnectionFactory build()
{
return new DefaultConnectionFactory()
return new DefaultConnectionFactory(clientMode)
{
@Override
public NodeLocator createLocator(List<MemcachedNode> nodes)
Expand Down Expand Up @@ -213,6 +215,45 @@ public long getAuthWaitTime()
{
return authWaitTime;
}

@Override
public SSLContext getSSLContext()
{
return sslContext == null ? super.getSSLContext() : sslContext;
}

@Override
public String getHostnameForTlsVerification()
{
return hostnameForTlsVerification == null ? super.getHostnameForTlsVerification() : hostnameForTlsVerification;
}
@Override
public ClientMode getClientMode()
{
return clientMode == null ? super.getClientMode() : clientMode;
}

@Override
public boolean skipTlsHostnameVerification()
{
return skipTlsHostnameVerification;
}

@Override
public String toString()
{
// MURMUR_128 cannot be cast to DefaultHashAlgorithm
return "Failure Mode: " + getFailureMode().name() + ", Hash Algorithm: "
+ getHashAlg() + " Max Reconnect Delay: "
+ getMaxReconnectDelay() + ", Max Op Timeout: " + getOperationTimeout()
+ ", Op Queue Length: " + getOpQueueLen() + ", Op Max Queue Block Time"
+ getOpQueueMaxBlockTime() + ", Max Timeout Exception Threshold: "
+ getTimeoutExceptionThreshold() + ", Read Buffer Size: "
+ getReadBufSize() + ", Transcoder: " + getDefaultTranscoder()
+ ", Operation Factory: " + getOperationFactory() + " isDaemon: "
+ isDaemon() + ", Optimized: " + shouldOptimize() + ", Using Nagle: "
+ useNagleAlgorithm() + ", KeepAlive: " + getKeepAlive() + ", SSLContext: " + getSSLContext().getProtocol() + ", ConnectionFactory: " + getName();
}
};
}
}
Loading