Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release for azure-cosmos 4.30.1 and adding reason to rntbd channel health check failures #29174

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions eng/jacoco-test-coverage/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -178,12 +178,12 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.3.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<version>1.2.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
</dependency>
<dependency>
<groupId>com.azure</groupId>
Expand Down
8 changes: 4 additions & 4 deletions eng/versioning/version_client.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,13 @@ com.azure:azure-core-serializer-json-gson;1.1.15;1.2.0-beta.1
com.azure:azure-core-serializer-json-jackson;1.2.16;1.3.0-beta.1
com.azure:azure-core-test;1.8.0;1.9.0-beta.1
com.azure:azure-core-tracing-opentelemetry;1.0.0-beta.23;1.0.0-beta.24
com.azure:azure-cosmos;4.30.0;4.31.0-beta.1
com.azure:azure-cosmos;4.30.0;4.30.1
com.azure:azure-cosmos-benchmark;4.0.1-beta.1;4.0.1-beta.1
com.azure:azure-cosmos-dotnet-benchmark;4.0.1-beta.1;4.0.1-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3_2-12;1.0.0-beta.1;1.0.0-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;4.10.0;4.11.0-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;4.10.0;4.11.0-beta.1
com.azure:azure-cosmos-encryption;1.2.0;1.3.0-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;4.10.0;4.10.1
com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;4.10.0;4.10.1
com.azure:azure-cosmos-encryption;1.2.0;1.2.1
com.azure:azure-data-appconfiguration;1.3.3;1.4.0-beta.1
com.azure:azure-data-appconfiguration-perf;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-data-schemaregistry;1.2.0;1.3.0-beta.1
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-benchmark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ Licensed under the MIT License.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>

<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.3.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<version>1.2.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
</dependency>

<dependency>
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos-dotnet-benchmark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Licensed under the MIT License.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>

<dependency>
Expand Down
10 changes: 2 additions & 8 deletions sdk/cosmos/azure-cosmos-encryption/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
## Release History

### 1.3.0-beta.1 (Unreleased)

#### Features Added

#### Breaking Changes

#### Bugs Fixed

### 1.2.1 (2022-06-01)
#### Other Changes
* Updated `azure-cosmos` to version `4.30.1`.

### 1.2.0 (2022-05-20)
#### Other Changes
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos-encryption/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The Azure Cosmos Encryption Plugin is used for encrypting data with a user-provi
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.2.0</version>
<version>1.2.1</version>
</dependency>
```
[//]: # ({x-version-update-end})
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-encryption/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Licensed under the MIT License.

<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.3.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<version>1.2.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<name>Encryption Plugin for Azure Cosmos DB SDK</name>
<description>This Package contains Encryption Plugin for Microsoft Azure Cosmos SDK</description>
<packaging>jar</packaging>
Expand Down Expand Up @@ -56,7 +56,7 @@ Licensed under the MIT License.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>

<dependency>
Expand Down
6 changes: 1 addition & 5 deletions sdk/cosmos/azure-cosmos-spark_3-1_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
## Release History

### 4.11.0-beta.1 (Unreleased)
### 4.10.1 (2022-06-01)

#### Features Added
* Added ability to disable endpoint rediscovery when using custom domain names in combination with private endpoints from a custom (on-premise) Spark environment (neither Databricks nor Synapse). - See [PR 29027](https://github.com/Azure/azure-sdk-for-java/pull/29027)
* Added a config option `spark.cosmos.serialization.dateTimeConversionMode` to allow changing date/time conversion to fall back to converting `java.sql.Date` and `java.sql.Tiemstamp` into Epoch Milliseconds like in the Cosmos DB Connector for Spark 2.4 - See [PR 29081](https://github.com/Azure/azure-sdk-for-java/pull/29081)

#### Breaking Changes

#### Bugs Fixed
* Fixed possible perf issue when Split results in 410 when trying to get latest LSN in Spark partitioner that could result in reprocessing change feed events (causing "hot partition2") - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)
* Fixed a bug resulting in ChangeFeed requests using the account's default consistency model instead of falling back to eventual if `spark.cosmos.read.forceEventualConsistency` is `true` (the default config). - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)

#### Other Changes

### 4.10.0 (2022-05-20)
#### Features Added
* Added the ability to change the target throughput control (`spark.cosmos.throughputControl.targetThroughputThreshold` or `spark.cosmos.throughputControl.targetThroughput`) when throughput control is enabled without having to also change the throughput control group name (`spark.cosmos.throughputControl.name`). - See [PR 28969](https://github.com/Azure/azure-sdk-for-java/pull/28969)
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-spark_3-1_2-12/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
</parent>
<groupId>com.azure.cosmos.spark</groupId>
<artifactId>azure-cosmos-spark_3-1_2-12</artifactId>
<version>4.11.0-beta.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;current} -->
<version>4.10.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;current} -->
<packaging>jar</packaging>
<url>https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-1_2-12</url>
<name>OLTP Spark 3.1 Connector for Azure Cosmos DB SQL API</name>
Expand Down Expand Up @@ -106,7 +106,7 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
Expand Down
6 changes: 1 addition & 5 deletions sdk/cosmos/azure-cosmos-spark_3-2_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
## Release History

### 4.11.0-beta.1 (Unreleased)
### 4.10.1 (2022-06-01)

#### Features Added
* Added ability to disable endpoint rediscovery when using custom domain names in combination with private endpoints from a custom (on-premise) Spark environment (neither Databricks nor Synapse). - See [PR 29027](https://github.com/Azure/azure-sdk-for-java/pull/29027)
* Added a config option `spark.cosmos.serialization.dateTimeConversionMode` to allow changing date/time conversion to fall back to converting `java.sql.Date` and `java.sql.Tiemstamp` into Epoch Milliseconds like in the Cosmos DB Connector for Spark 2.4 - See [PR 29081](https://github.com/Azure/azure-sdk-for-java/pull/29081)

#### Breaking Changes

#### Bugs Fixed
* Fixed possible perf issue when Split results in 410 when trying to get latest LSN in Spark partitioner that could result in reprocessing change feed events (causing "hot partition2") - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)
* Fixed a bug resulting in ChangeFeed requests using the account's default consistency model instead of falling back to eventual if `spark.cosmos.read.forceEventualConsistency` is `true` (the default config). - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)

#### Other Changes

### 4.10.0 (2022-05-20)
#### Features Added
* Added the ability to change the target throughput control (`spark.cosmos.throughputControl.targetThroughputThreshold` or `spark.cosmos.throughputControl.targetThroughput`) when throughput control is enabled without having to also change the throughput control group name (`spark.cosmos.throughputControl.name`). - See [PR 28969](https://github.com/Azure/azure-sdk-for-java/pull/28969)
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-spark_3-2_2-12/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
</parent>
<groupId>com.azure.cosmos.spark</groupId>
<artifactId>azure-cosmos-spark_3-2_2-12</artifactId>
<version>4.11.0-beta.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;current} -->
<version>4.10.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;current} -->
<packaging>jar</packaging>
<url>https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-2_2-12</url>
<name>OLTP Spark 3.2 Connector for Azure Cosmos DB SQL API</name>
Expand Down Expand Up @@ -108,7 +108,7 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos-spark_3_2-12/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
Expand Down
12 changes: 4 additions & 8 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
## Release History

### 4.31.0-beta.1 (Unreleased)

#### Features Added

#### Breaking Changes

#### Bugs Fixed
### 4.30.1 (2022-06-01)

#### Other Changes
* Making CosmosPatchOperations thread-safe. Usually there is no reason to modify a CosmosPatchOperations instance concurrently form multiple threads - but making it thread-safe acts as protection in case this is done anyway - See [PR 29143](https://github.com/Azure/azure-sdk-for-java/pull/29143)
* Made CosmosPatchOperations thread-safe. Usually there is no reason to modify a CosmosPatchOperations instance concurrently form multiple threads - but making it thread-safe acts as protection in case this is done anyway - See [PR 29143](https://github.com/Azure/azure-sdk-for-java/pull/29143)
* Added system property to allow overriding proxy setting for client telemetry endpoint. - See [PR 29022](https://github.com/Azure/azure-sdk-for-java/pull/29022)
* Added additional information about the reason on Rntbd channel health check failures. - See [PR 29022](https://github.com/Azure/azure-sdk-for-java/pull/29022)

### 4.30.0 (2022-05-20)
#### Bugs Fixed
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ add the direct dependency to your project as follows.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.30.0</version>
<version>4.30.1</version>
</dependency>
```
[//]: # ({x-version-update-end})
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Licensed under the MIT License.

<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<name>Microsoft Azure SDK for SQL API of Azure Cosmos DB Service</name>
<description>This Package contains Microsoft Azure Cosmos SDK (with Reactive Extension Reactor support) for Azure Cosmos DB SQL API</description>
<packaging>jar</packaging>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.text.MessageFormat;
import java.util.Optional;
import java.util.concurrent.atomic.AtomicLongFieldUpdater;

Expand Down Expand Up @@ -126,7 +127,6 @@ public long writeDelayLimitInNanos() {
* @return A future with a result of {@code true} if the channel is healthy, or {@code false} otherwise.
*/
public Future<Boolean> isHealthy(final Channel channel) {

checkNotNull(channel, "expected non-null channel");

final RntbdRequestManager requestManager = channel.pipeline().get(RntbdRequestManager.class);
Expand Down Expand Up @@ -181,7 +181,7 @@ public Future<Boolean> isHealthy(final Channel channel) {
final int pendingRequestCount = requestManager.pendingRequestCount();

logger.warn("{} health check failed due to nonresponding read: {lastChannelWrite: {}, lastChannelRead: {}, "
+ "readDelay: {}, readDelayLimit: {}, rntbdContext: {}, pendingRequestCount: {}}", channel,
+ "readDelay: {}, readDelayLimit: {}, rntbdContext: {}, pendingRequestCount: {}}", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(), readDelay,
this.readDelayLimitInNanos, rntbdContext, pendingRequestCount);

Expand All @@ -206,6 +206,122 @@ public Future<Boolean> isHealthy(final Channel channel) {
return promise;
}

/**
* Determines whether a specified channel is healthy.
*
* @param channel A channel whose health is to be checked.
* @return A future with a result of {@code true} if the channel is healthy, or {@code false} otherwise.
*/
public Future<String> isHealthyWithFailureReason(final Channel channel) {

checkNotNull(channel, "expected non-null channel");

final RntbdRequestManager requestManager = channel.pipeline().get(RntbdRequestManager.class);
final Promise<String> promise = channel.eventLoop().newPromise();

if (requestManager == null) {
reportIssueUnless(logger, !channel.isActive(), channel, "active with no request manager");
return promise.setSuccess("active with no request manager");
}

final Timestamps timestamps = requestManager.snapshotTimestamps();
final long currentTime = System.nanoTime();

if (currentTime - timestamps.lastChannelReadNanoTime() < recentReadWindowInNanos) {
// because we recently received data
return promise.setSuccess(RntbdConstants.RntbdHealthCheckResults.SuccessValue);
}

// Black hole detection, part 1:
// Treat the channel as unhealthy if the gap between the last attempted write and the last successful write
// grew beyond acceptable limits, unless a write was attempted recently. This is a sign of a nonresponding write.

final long writeDelayInNanos =
timestamps.lastChannelWriteAttemptNanoTime() - timestamps.lastChannelWriteNanoTime();

final long writeHangDurationInNanos =
currentTime - timestamps.lastChannelWriteAttemptNanoTime();

if (writeDelayInNanos > this.writeDelayLimitInNanos && writeHangDurationInNanos > writeHangGracePeriodInNanos) {

final Optional<RntbdContext> rntbdContext = requestManager.rntbdContext();
final int pendingRequestCount = requestManager.pendingRequestCount();

logger.warn("{} health check failed due to nonresponding write: {lastChannelWriteAttemptNanoTime: {}, " +
"lastChannelWriteNanoTime: {}, writeDelayInNanos: {}, writeDelayLimitInNanos: {}, " +
"rntbdContext: {}, pendingRequestCount: {}}",
channel, timestamps.lastChannelWriteAttemptNanoTime(), timestamps.lastChannelWriteNanoTime(),
writeDelayInNanos, this.writeDelayLimitInNanos, rntbdContext, pendingRequestCount);

String msg = MessageFormat.format(
"{0} health check failed due to nonresponding write: (lastChannelWriteAttemptNanoTime: {1}, " +
"lastChannelWriteNanoTime: {2}, writeDelayInNanos: {3}, writeDelayLimitInNanos: {4}, " +
"rntbdContext: {5}, pendingRequestCount: {6})",
channel, timestamps.lastChannelWriteAttemptNanoTime(), timestamps.lastChannelWriteNanoTime(),
writeDelayInNanos, this.writeDelayLimitInNanos, rntbdContext, pendingRequestCount
);

return promise.setSuccess(msg);
}

// Black hole detection, part 2:
// Treat the connection as unhealthy if the gap between the last successful write and the last successful read
// grew beyond acceptable limits, unless a write succeeded recently. This is a sign of a nonresponding read.

final long readDelay = timestamps.lastChannelWriteNanoTime() - timestamps.lastChannelReadNanoTime();
final long readHangDuration = currentTime - timestamps.lastChannelWriteNanoTime();

if (readDelay > this.readDelayLimitInNanos && readHangDuration > readHangGracePeriodInNanos) {

final Optional<RntbdContext> rntbdContext = requestManager.rntbdContext();
final int pendingRequestCount = requestManager.pendingRequestCount();

logger.warn("{} health check failed due to nonresponding read: {lastChannelWrite: {}, lastChannelRead: {}, "
+ "readDelay: {}, readDelayLimit: {}, rntbdContext: {}, pendingRequestCount: {}}", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(), readDelay,
this.readDelayLimitInNanos, rntbdContext, pendingRequestCount);

String msg = MessageFormat.format(
"{0} health check failed due to nonresponding read: (lastChannelWrite: {1}, lastChannelRead: {2}, "
+ "readDelay: {3}, readDelayLimit: {4}, rntbdContext: {5}, pendingRequestCount: {6})", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(), readDelay,
this.readDelayLimitInNanos, rntbdContext, pendingRequestCount
);

return promise.setSuccess(msg);
}

if (this.idleConnectionTimeoutInNanos > 0L) {
if (currentTime - timestamps.lastChannelReadNanoTime() > this.idleConnectionTimeoutInNanos) {
String msg = MessageFormat.format(
"{0} health check failed due to idle connection timeout: (lastChannelWrite: {1}, lastChannelRead: {2}, "
+ "idleConnectionTimeout: {3}, currentTime: {4}", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(),
idleConnectionTimeoutInNanos, currentTime
);
return promise.setSuccess(msg);
}
}

channel.writeAndFlush(RntbdHealthCheckRequest.MESSAGE).addListener(completed -> {
if (completed.isSuccess()) {
promise.setSuccess(RntbdConstants.RntbdHealthCheckResults.SuccessValue);
} else {
logger.warn("{} health check request failed due to:", channel, completed.cause());

String msg = MessageFormat.format(
"{0} health check request failed due to: {1}",
channel,
completed.cause().toString()
);

promise.setSuccess(msg);
}
});

return promise;
}

@Override
public String toString() {
return RntbdObjectMapper.toString(this);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ public final class RntbdConstants {
private RntbdConstants() {
}

public static class RntbdHealthCheckResults {
public static final String SuccessValue = "Success";
}

public enum RntbdConsistencyLevel {

Strong((byte) 0x00),
Expand Down
Loading