[Bug 🐞]: Cannot deploy VM in Australia, in Europe i can #419

sony87 · 2024-02-25T19:55:02Z

What happened?

I'm in Europe and can deploy VMs in Europe farms without issue. If i try to deploy to any Australia farm/node it fails after 10 minutes, constantly.

What did you expect?

To be able to deploy everywhere despite my location.

What browsers are you seeing the problem on?

No response

ZOS info

No response

Dashboard info

No response

weblets info

No response

Relevant log output

No response

xmonader · 2024-02-25T20:14:29Z

can you please add the farm id or the node id that you tried to deploy on?

xmonader · 2024-02-25T20:24:15Z

check 4985 and 2594, couldn't deploy on both

it kept giving Waiting for deployment with contract_id: 236293 to be ready and Waiting for deployment with contract_id: 236290 to be ready

sony87 · 2024-02-25T20:52:58Z

Nodes: 4349, 4350,
on Farm "Mango Farm" most of the nodes does not work, 2595, 2596, 2636 etc....

sabrinasadik · 2024-02-27T09:07:13Z

The problem might be caused by latency to the hub. This in turn could cause the deployment to time out while it's fetching data from the hub (probably when copying a disk image from 0-fs to the local disk). If this is indeed the problem, it can be verified as follows:

check on metrics.grid.tf: you should see network usage at the time of the deployment which lasts for more than 10 minutes and can be considered slow
If you verify yourself: start a VM with a disk image which is not on the node you're deploying on.
check on metrics, you will see a relatively consistent network usage.
after a while (about 10min) the deployment will time out.
network usage will still be the same.
after some more time, the network usage will drop again (this means the disk image finished downloading).
if you now deploy the same disk image again, it should work.

I'm assuming the disk copy keeps running after the deployment time-out. If that is not the case, you'll have to redeploy a couple of times possibly, until the disk image is in the 0-fs cache completely.

If this is indeed the case, then there either needs to be a workaround in zos or the actual solution is to make sure that the hub is present in multiple geographic regions so latency is consistently low (distributed hub or some kind of cdn thing).

PeterNashaat · 2024-02-27T14:16:13Z

Deploying vm on TheBatcave farm-id 2252, node-id 4985

VM with ubuntu 22 flist which was already downloaded on the node working fine
- ZOS logs :

 [+] flistd: 2024-02-27T09:25:36Z info flist already in on the filesystem url=https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist

While deploying with nixos flist which was not used before on that node
- ZOS logs :

2024-02-27 14:34:00 | [+] flistd: 2024-02-27T13:34:00Z info request to mount flist: {ReadOnly:true Limit:0 Storage: PersistedVolume:} name=cloud-container:c65ef166512f3d5fe7c61fc3d8dd3c89 storage= url=https://hub.grid.tf/tf-autobuilder/cloud-container-8730b6f.flist
-- | --
  |   | 2024-02-27 14:33:57 | [+] identityd: 2024-02-27T13:33:57Z info checking for update after milliseconds wait=4440000
  |   | 2024-02-27 14:33:57 | [+] identityd: 2024-02-27T13:33:57Z info checking if update is required current=3.9.0 latest=3.9.0
  |   | 2024-02-27 14:33:56 | [+] flistd: 2024-02-27T13:33:56Z info starting g8ufs daemon args=["--cache","/var/cache/modules/flistd/cache","--meta","/var/cache/modules/flistd/flist/fa05b43ad1c5362453cb70de7cea9664","--daemon","--log","/var/cache/modules/flistd/log/fa05b43ad1c5362453cb70de7cea9664.log"] storage= url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist
  |   | 2024-02-27 14:33:54 | [+] flistd: 2024-02-27T13:33:54Z info request to mount flist storage= url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist
  |   | 2024-02-27 14:33:54 | [+] flistd: 2024-02-27T13:33:54Z info request to mount flist: {ReadOnly:true Limit:0 Storage: PersistedVolume:} name=604-240316-thebatcavetest2 storage= url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist

Node Network Traffic was at it's peake and getting higher each minute as you can see from these 2 screenshots :

From Dashboard, first it was waiting for vm to be ready

Waiting for deployment with contract_id: 240316 to be ready

Then got this error.

Failed to send request to twinId 7688 with command: zos.deployment.get, payload: {"contract_id":240316} Didn't get a response after 20 seconds

Then Contracts got Cancled
- ZOS logs :

- Network Traffic still getting higher :

Tried deploying nixos again, after network traffic decreased
- ZOS logs :

[+] flistd: 2024-02-27T14:05:57Z info flist already in on the filesystem url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist

Network Traffic :

VM was deployed successfully

Did a quick speed test on the vm

root@thebatcavetest:~# speedtest-cli
Retrieving speedtest.net configuration...
Testing from Aussie Broadband (159.196.171.188)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Superloop Australia Pty Ltd (Sydney) [0.09 km]: 16.961 ms
Testing download speed................................................................................
Download: 269.32 Mbit/s
Testing upload speed......................................................................................................
Upload: 23.58 Mbit/s

@sabrinasadik Confirmed flist download from the hub takes long time, which cause a timeout on dashboard side then cancelling the contracts, but downloading the flist continues and deploying it again works after download is done.

sony87 · 2024-02-27T15:05:48Z

So what you are saying is that i need to stay and re-deploying on the same machine untill it comples ?

sabrinasadik · 2024-02-27T15:14:12Z

Until we have a workaround or fix the issue, yes. @xmonader let's discuss further to have a solution for this.

khaledyoussef24 added the type_bug Something isn't working label Feb 26, 2024

khaledyoussef24 added this to 3.13.x Feb 26, 2024

sabrinasadik assigned PeterNashaat Feb 27, 2024

PeterNashaat moved this to In Progress in 3.13.x Feb 27, 2024

PeterNashaat moved this from In Progress to In Verification in 3.13.x Feb 28, 2024

sabrinasadik mentioned this issue Feb 29, 2024

Distributed hub threefoldtech/home#1523

Open

ramezsaeed closed this as completed Mar 18, 2024

github-project-automation bot moved this from In Verification to Done in 3.13.x Mar 18, 2024

iwanbk mentioned this issue Aug 7, 2024

faster OS image download threefoldtech/zos#2391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug 🐞]: Cannot deploy VM in Australia, in Europe i can #419

[Bug 🐞]: Cannot deploy VM in Australia, in Europe i can #419

sony87 commented Feb 25, 2024

xmonader commented Feb 25, 2024

xmonader commented Feb 25, 2024

sony87 commented Feb 25, 2024

sabrinasadik commented Feb 27, 2024

PeterNashaat commented Feb 27, 2024

sony87 commented Feb 27, 2024

sabrinasadik commented Feb 27, 2024

[Bug 🐞]: Cannot deploy VM in Australia, in Europe i can #419

[Bug 🐞]: Cannot deploy VM in Australia, in Europe i can #419

Comments

sony87 commented Feb 25, 2024

What happened?

What did you expect?

What browsers are you seeing the problem on?

ZOS info

Dashboard info

weblets info

Relevant log output

xmonader commented Feb 25, 2024

xmonader commented Feb 25, 2024

sony87 commented Feb 25, 2024

sabrinasadik commented Feb 27, 2024

PeterNashaat commented Feb 27, 2024

Deploying vm on TheBatcave farm-id 2252, node-id 4985

sony87 commented Feb 27, 2024

sabrinasadik commented Feb 27, 2024