Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZDB status not stable with Yggdrasil #167

Closed
A-Harby opened this issue Jul 22, 2024 · 7 comments
Closed

ZDB status not stable with Yggdrasil #167

A-Harby opened this issue Jul 22, 2024 · 7 comments

Comments

@A-Harby
Copy link

A-Harby commented Jul 22, 2024

I have a few questions. First, why would a ZDB go down? What could be the reason?
image

{
  "version": 0,
  "twin_id": 162,
  "contract_id": 132880,
  "metadata": "{\"version\":3,\"type\":\"vm\",\"name\":\"node159meta\",\"projectName\":\"node159meta\"}",
  "description": "",
  "expiration": 0,
  "signature_requirement": {
    "requests": [
      {
        "twin_id": 162,
        "required": false,
        "weight": 1
      }
    ],
    "weight_required": 1,
    "signatures": [
      {
        "twin_id": 162,
        "signature": "6ed02e414562292e50b34f1ae325bd4597bb364bb0d4b85d47a8ef9a82bbb557d21d42ee9cd9b959c381ead802728b36659e4490eb17612d9e3e453c0cfe3b83",
        "signature_type": "sr25519"
      }
    ],
    "signature_style": ""
  },
  "workloads": [
    {
      "version": 0,
      "name": "node159meta0",
      "type": "zdb",
      "data": {
        "size": 1073741824,
        "mode": "user",
        "password": "password",
        "public": false
      },
      "metadata": "",
      "description": "",
      "result": {
        "created": 1721637882,
        "state": "ok",
        "message": "",
        "data": {
          "Namespace": "162-132880-node159meta0",
          "IPs": [
            "2a02:1802:5e:14:90cf:5dff:fe6f:3e58",
            "300:cb94:a268:f50:b062:7056:cca5:43e5",
            "4a9:556a:be87:fac0:98c4:a2d9:c7c3:6973"
          ],
          "Port": 9900
        }
      }
    }
  ]
}

Second, why would a zdb status be up, then down, then up again?
image

@A-Harby
Copy link
Author

A-Harby commented Jul 22, 2024

And it happened again to another ZDB.
image

@maxux
Copy link
Collaborator

maxux commented Jul 29, 2024

Can you check zdb logs ? Can be lot of reasons :p
If you hit a connection refused, zdb is not listening/is crashed, need logs to know why. Could be no more disk space for example, zdb should not crash but some edge case are maybe not supported.

@A-Harby
Copy link
Author

A-Harby commented Jul 30, 2024

Can you check zdb logs ? Can be lot of reasons :p If you hit a connection refused, zdb is not listening/is crashed, need logs to know why. Could be no more disk space for example, zdb should not crash but some edge case are maybe not supported.

Can I know how to check ZDB logs? It would be great to know all the ways to get any other logs as well.

@scottyeager
Copy link

These zdbs are running inside of Zos. I don't know if those logs get shipped over Loki with the rest of the node logs. At least, I'm not able to find any in some searches now and I don't recall ever seeing them.

In this case though I think we should be looking at the network connectivity as the first and most likely failure point. The IP addresses starting with 300 and 301 are Yggdrasil IPs. Taking Yggdrasil out of the picture would be a good first step since we know the performance and availability aren't consistent.

For the rest those are connecting over Mycelium. I checked the logs from a couple of the nodes causing the dropouts (devnet). With node 159 I found a lot of errors regarding connecting to Mycelium peers and also failing the network health checks in general. So I wonder if the node maybe just generally doesn't have a healthy network.

My suggestions to help narrow this down would be:

  1. Also test using the IPv6 addresses (those starting with 2...) to eliminate the possibility of Mycelium related issues
  2. Try some nodes on testnet or mainnet. Since we know that devnet nodes are not necessarily the fittest hardware and are sometimes heavily loaded

If network connectivity can reasonably be ruled out as the root cause, that's when I'd go looking for potential issues in zdb itself.

@A-Harby
Copy link
Author

A-Harby commented Aug 4, 2024

I have seen some different results after using IPV6 to connect instead of ygg or mycelium.

image

So maybe we should make the connection with IPV6 until we can get the ygg and mycelium stable for the ZDB.

@A-Harby A-Harby changed the title ZDB status go down ZDB status not stable with Yggdrasil Aug 5, 2024
@scottyeager
Copy link

I think this is explained by threefoldtech/zos#2403

So not really a Zdb issue. I think we can close this one, if that's okay with you @A-Harby.

@A-Harby
Copy link
Author

A-Harby commented Aug 25, 2024

I agree it can be closed as long as it is tracked in another issue threefoldtech/zos#2403.

@A-Harby A-Harby closed this as completed Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants