Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Pod has no outbound connection on 0.9.4 + RS3 Windows #1881

Closed
yuedai opened this issue Dec 5, 2017 · 7 comments
Closed

Pod has no outbound connection on 0.9.4 + RS3 Windows #1881

yuedai opened this issue Dec 5, 2017 · 7 comments

Comments

@yuedai
Copy link

yuedai commented Dec 5, 2017

Is this a request for help?:
yes


Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue

What version of acs-engine?:
0.9.4

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes

What happened:
I use 0.9.4 to provision a windows cluster, with RS3 windows. I created a deployment and the pod could be started successfully. But out-going call are all failed on pod.
e.g. I "kubectl exec" to one pod and try to nslookup, dns server is totally not reachable.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
This is my API model:

{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.8",
"orchestratorVersion": "1.8.2",
"kubernetesConfig": {
"kubernetesImageBase": "gcrio.azureedge.net/google_containers/",
"clusterSubnet": "10.244.0.0/16",
"dnsServiceIP": "10.0.0.10",
"serviceCidr": "10.0.0.0/16",
"networkPolicy": "none",
"nonMasqueradeCidr": "10.0.0.0/8",
"maxPods": 110,
"dockerBridgeSubnet": "172.17.0.1/16",
"nodeStatusUpdateFrequency": "10s",
"ctrlMgrNodeMonitorGracePeriod": "40s",
"ctrlMgrPodEvictionTimeout": "5m0s",
"ctrlMgrRouteReconciliationPeriod": "10s",
"gchighthreshold": 85,
"gclowthreshold": 80,
"etcdVersion": "2.2.5",
"etcdDiskSizeGB": "128",
"addons": [
{
"name": "tiller",
"enabled": true,
"containers": [
{
"name": "tiller",
"cpuRequests": "50m",
"memoryRequests": "150Mi",
"cpuLimits": "50m",
"memoryLimits": "150Mi"
}
]
},
{
"name": "kubernetes-dashboard",
"enabled": true,
"containers": [
{
"name": "kubernetes-dashboard",
"cpuRequests": "300m",
"memoryRequests": "150Mi",
"cpuLimits": "300m",
"memoryLimits": "150Mi"
}
]
}
]
}
},
"masterProfile": {
"count": 1,
"dnsPrefix": "cci-1709",
"vmSize": "Standard_D2_v2",
"osDiskSizeGB": 30,
"firstConsecutiveStaticIP": "10.240.255.5",
"storageProfile": "ManagedDisks",
"oauthEnabled": false,
"preProvisionExtension": null,
"extensions": [],
"distro": "ubuntu"
},
"agentPoolProfiles": [
{
"name": "cci1709",
"count": 1,
"vmSize": "Standard_D5_v2",
"osDiskSizeGB": 200,
"osType": "Windows",
"availabilityProfile": "AvailabilitySet",
"storageProfile": "ManagedDisks",
"distro": "ubuntu",
"fqdn": "",
"preProvisionExtension": null,
"extensions": []
},
{
"name": "side",
"count": 1,
"vmSize": "Standard_D3_v2",
"osDiskSizeGB": 30,
"osType": "Linux",
"availabilityProfile": "AvailabilitySet",
"storageProfile": "ManagedDisks",
"distro": "ubuntu",
"fqdn": "",
"preProvisionExtension": null,
"extensions": []
}
],
"linuxProfile": {
"adminUsername": "",
"ssh": {
"publicKeys": [
{
"keyData": ""
}
]
}
},
"windowsProfile": {
"adminUsername": "",
"adminPassword": ""
},
"servicePrincipalProfile": {
"clientId": "",
"secret": ""
},
"certificateProfile": {
"caCertificate": "",
"caPrivateKey": "",
"apiServerCertificate": "",
"apiServerPrivateKey": "",
"clientCertificate": "",
"clientPrivateKey": "",
"kubeConfigCertificate": "",
"kubeConfigPrivateKey": ""
}
}
}

Anything else we need to know:

@jackfrancis
Copy link
Member

@JiangtianLi is does this issue overlap with work in #1810?

Or does it appear to be a transient issue where a cluster had no outbound access?

@JiangtianLi
Copy link
Contributor

@yuedai What does Resolve-DnsName www.bing.com output in your container?

@jackfrancis This is a known random issue with Windows DNS and has been serviced to windows update.

@yuedai
Copy link
Author

yuedai commented Dec 6, 2017

It's timeout. The Dns server (10.0.0.10) is not reachable. On another 2016 cluster (from 0.8.0 acs-engine) it works fine.

@JiangtianLi
Copy link
Contributor

@yuedai Does the output from Resolve-DnsName say "No DNS servers configured for local system"? If so, it is the local DNS config issue, not the no response from DNS server issue. And it has been fixed in Windows. Will update when the fix is in acs-engine deployment.

@yuedai
Copy link
Author

yuedai commented Dec 6, 2017 via email

@JiangtianLi
Copy link
Contributor

@yuedai Although DNS server shows up in ipconfig /all inside windows container, there is a race condition to make the config actually failed. So if you use Resolve-DnsName and see "No DNS servers configured for local system" message, that'll confirm it.

@stale
Copy link

stale bot commented Mar 9, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead.

@stale stale bot added the stale label Mar 9, 2019
@stale stale bot closed this as completed Mar 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants