Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-1327 ⁃ EndpointState has incorrect json key #298

Closed
sseidman opened this issue Mar 17, 2022 · 5 comments · Fixed by #299
Closed

K8SSAND-1327 ⁃ EndpointState has incorrect json key #298

sseidman opened this issue Mar 17, 2022 · 5 comments · Fixed by #299
Assignees
Labels
bug Something isn't working

Comments

@sseidman
Copy link

sseidman commented Mar 17, 2022

What happened?
EndpointState has incorrect json key in struct (

NativeTransportAddress string `json:"NATIVE_TRANSPORT_ADDRESS,omitempty"`
). Expected that GetRpcAddress() would resolve to a valid IP address for a node, but was returning "". Should default to NATIVE_TRANSPORT_ADDRESS when no rpc address is returned for an endpoint, but the cass-management-api call actually returns NATIVE_ADDRESS_AND_PORT

Did you expect to see something different?
Expected the value from NATIVE_ADDRESS_AND_PORT to be returned in GetRpcAddress call, not an empty string

How to reproduce it (as minimally and precisely as possible):
execute curl http://localhost:8080/api/v0/metadata/endpoints from a pod to see the json value returned and the presence of NATIVE_ADDRESS_AND_PORT instead of NATIVE_TRANSPORT_ADDRESS

Reproduce failed node replacement (how the issue was found):

  1. Spin up a cassandra cluster that uses the following image k8ssandra/cass-management-api:4.0.1-v0.1.30
  2. Manually execute the commands from the following node_replace test (https://github.com/k8ssandra/cass-operator/blob/master/tests/node_replace/node_replace_suite_test.go)
  3. The API call to start the new node will have replaceIP: "" instead of the actual replace address

Environment
AWS

  • Cass Operator version:

    "docker.io/k8ssandra/cass-operator:v1.9.0"

    * Kubernetes version information: `Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/amd64"}Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.9-dd.1", GitCommit:"f71f8bd315fe5d6d7d44b619bd9364362fb8d9a8", GitTreeState:"clean", BuildDate:"2021-08-06T21:03:07Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}` * Kubernetes cluster kind:```

k8ssandra deployment


* Manifests:

insert manifests relevant to the issue


* Cass Operator Logs:

controllers.CassandraDatacenter calling Management API start node - POST /api/v0/lifecycle/start {"cassandradatacenter": "cassandra-k8ssandra-seid/dc1", "requestNamespace": "cassandra-k8ssandra-seid", "requestName": "dc1", "loopID": "a15cf01d-43c1-4ae0-a41c-ab58ef7cb24c", "namespace": "cassandra-k8ssandra-seid", "datacenterName": "dc1", "clusterName": "k8ssandra-seid", "pod": "k8ssandra-seid-dc1-rack1-sts-0", "podIP": "xx.xx.xx.xx", "replaceIP": ""}


**Anything else we need to know?**:
Further discussion/debugging in discord channel



┆Issue is synchronized with this [Jira Task](https://k8ssandra.atlassian.net/browse/K8SSAND-1327) by [Unito](https://www.unito.io)
┆friendlyId: K8SSAND-1327
┆priority: Medium
@sseidman sseidman added the bug Something isn't working label Mar 17, 2022
@sync-by-unito sync-by-unito bot changed the title EndpointState has incorrect json key K8SSAND-1327 ⁃ EndpointState has incorrect json key Mar 17, 2022
@jsanda jsanda assigned jsanda and adutra and unassigned jsanda Mar 17, 2022
@jsanda
Copy link
Contributor

jsanda commented Mar 17, 2022

@adutra

The net effect of this bug is that node replacements are broken with C* 4.0.1 and presumably 4.0.3. We are pulling this into the current sprint since node replacements is a frequently used feature.

Here's a link to the start of the discussion in Discord.

We should also make sure this works with DSE. I will confirm the exact versions.

@burmanm
Copy link
Contributor

burmanm commented Mar 17, 2022

NATIVE_ADDRESS_AND_PORT is a field that was introduced in 4.x (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L1015), it's not present in the 3.11.

The NATIVE_TRANSPORT_ADDRESS is a field in the Cassandra configuration and IIRC the management-api returns the endpointState JSON as a combination of several inputs, not just the Cassandra's state. So outright replacing the JSON key is not going to be the correct approach most likely.

@adutra
Copy link
Contributor

adutra commented Mar 17, 2022

I checked org.apache.cassandra.gms.ApplicationState for C* 3.11, 4.0 and DSE 6.8:

  • C* 3.11 has RPC_ADDRESS (type: InetAddress)
  • C* 4.0 has NATIVE_ADDRESS_AND_PORT (type: org.apache.cassandra.locator.InetAddressAndPort)
  • DSE 6.8 has NATIVE_TRANSPORT_ADDRESS (type: InetAddress)

I suggest that we support all 3, and we will likely need to strip the port part from NATIVE_ADDRESS_AND_PORT in GetRpcAddress. Typically C* 4.0 returns this:

"NATIVE_ADDRESS_AND_PORT": "10.244.1.4:9042"

@adutra
Copy link
Contributor

adutra commented Mar 17, 2022

@sseidman can you post the entire json payload for curl http://localhost:8080/api/v0/metadata/endpoints? On my machine I get both RPC_ADDRESS and NATIVE_ADDRESS_AND_PORT so GetRpcAddress behaves correctly.

@sseidman
Copy link
Author

RPC_ADDRESS is only populated for the endpoint that makes the API call. NATIVE_ADDRESS_AND_PORT is returned for all endpoints, but wasn't being parsed into the struct.

	"entity": [{
		"DC": "dc1",
		"ENDPOINT_IP": "1.1.1.1",
		"HOST_ID": "2536db4c-dcb7-4347-84bf-bd75b7400fc1",
		"INTERNAL_ADDRESS_AND_PORT": "1.1.1.1:7000",
		"IS_ALIVE": "true",
		"LOAD": "263054.0",
		"NATIVE_ADDRESS_AND_PORT": "1.1.1.1:9042",
		"NET_VERSION": "12",
		"RACK": "rack1",
		"RELEASE_VERSION": "4.0.1",
		"RPC_READY": "true",
		"SCHEMA": "90c24d8f-7264-3b19-87e7-bcf987fdc8f0",
		"SSTABLE_VERSIONS": "big-nb",
		"STATUS_WITH_PORT": "NORMAL,-1224706216952945206",
		"TOKENS": ...
	}, {
		"DC": "dc1",
		"ENDPOINT_IP": "2.2.2.2",
		"HOST_ID": "38f51812-5aaa-4906-ba65-38b5424f90ec",
		"INTERNAL_ADDRESS_AND_PORT": "2.2.2.2:7000",
		"INTERNAL_IP": "2.2.2.2",
		"IS_ALIVE": "true",
		"LOAD": "236235.0",
		"NATIVE_ADDRESS_AND_PORT": "2.2.2.2:9042",
		"NET_VERSION": "12",
		"RACK": "rack2",
		"RELEASE_VERSION": "4.0.1",
		"RPC_ADDRESS": "2.2.2.2",
		"RPC_READY": "true",
		"SCHEMA": "90c24d8f-7264-3b19-87e7-bcf987fdc8f0",
		"SSTABLE_VERSIONS": "big-nb",
		"STATUS": "NORMAL,-2449408148972492354",
		"STATUS_WITH_PORT": "NORMAL,-2449408148972492354",
		"TOKENS": ...
	}, {
		"DC": "dc1",
		"ENDPOINT_IP": "3.3.3.3",
		"HOST_ID": "08527c15-e7b0-41aa-ac50-62c0669b8ebb",
		"INTERNAL_ADDRESS_AND_PORT": "3.3.3.3:7000",
		"IS_ALIVE": "true",
		"LOAD": "243981.0",
		"NATIVE_ADDRESS_AND_PORT": "3.3.3.3:9042",
		"NET_VERSION": "12",
		"RACK": "rack3",
		"RELEASE_VERSION": "4.0.1",
		"RPC_READY": "true",
		"SCHEMA": "90c24d8f-7264-3b19-87e7-bcf987fdc8f0",
		"SSTABLE_VERSIONS": "big-nb",
		"STATUS_WITH_PORT": "NORMAL,-1549540139895431251",
		"TOKENS": ...
	}]

burmanm pushed a commit that referenced this issue Mar 18, 2022
* EndpointState has incorrect json key (fixes #298)

* Add unit test

* Change e2e test to use 4.0.3

* Fix RetrieveStatusFromNodetool
burmanm pushed a commit that referenced this issue Apr 5, 2022
* EndpointState has incorrect json key (fixes #298)

* Add unit test

* Change e2e test to use 4.0.3

* Fix RetrieveStatusFromNodetool

(cherry picked from commit b7e8c77)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants