-
Notifications
You must be signed in to change notification settings - Fork 266
CI failure umbrella issue #1054
Comments
Just got:
[sur]: addressed in #1246 |
The IGW issue also appeared in https://jenkins-tectonic-installer.prod.coreos.systems/blue/organizations/jenkins/tectonic-installer/detail/PR-1074/2/pipeline/ again. So #1017 is still happening. |
The IGW issue has been mitigated for now by retrying the deletion (#1077). SPC engineers are working upstream to add the necessary timeout lifecycle. |
Seems to be the same as #894. [sur] addressed in #1246 |
[sur] addressed in #1246 |
^
[sur] addressed in #1246 |
Still seems to be a common reason for tests failing: https://jenkins-tectonic-installer.prod.coreos.systems/blue/rest/organizations/jenkins/pipelines/tectonic-installer/branches/PR-1193/runs/3/nodes/39/steps/90/log/?start=0
edit: kans [sur] addressed in #1246 |
[sur] addressed in #1245 |
[sur] filed #1455 |
[sur] filed #1456 |
[sur] addressed in #1265 |
[sur] addressed in #1283 |
|
|
addressed timeout in #1054 (comment) in #1283 as a stop-gap solution as long as @mxinden finalizes research on the testing frameworks. |
|
need to split this into smaller issues |
We see these often now:
|
The
I see you guys are launching matchbox with rkt containers. FYI, there is a docker mode now as well if you are interested in future. |
We hit a failed matchbox unit . The unit is killed and fails upon restarting. edit: from #1408 |
Hm, we'd want to figure out what killed the unit. The You may also try the docker mode and avoid setting up a complete rkt environment, if you update
|
Agreed that we'd like to switch to use the Docker runtime to avoid the CNI cleanup workarounds (no offense). We could then drop the rkt setup in the Jenkins script. |
While the docker setup is probably the way this project should go, do note that it is an easier out of box experience that masks the same difficulty - the need for the containers to have known IP addresses. Docker does this just by assigning IPs in the order in which containers are created (rather than explicitly like rkt). I believe you can request specific IPs with a custom docker bridge, but then you have the same setup difficulty you had before. Just be mindful - IPs must be known because we're using containers to setup a virtual bridge that is a bare-metal simulation environment and docker will happen to give your container IPs in creation order - if you don't cleanup properly, they won't be what you expect. |
I don't know if here is the right place to post the issue I saw yesterday (17.07.2017). if not please let me know and I will delete the post. this issue happened in the Azure: module.vnet.azurerm_network_security_rule.worker_ingress_heapster: Still creating... (1m20s elapseError applying plan:
1 error(s) occurred:
* module.vnet.azurerm_network_security_rule.master_ingress_kubelet_secure_from_worker: 1 error(s) occurred:
* azurerm_network_security_rule.master_ingress_kubelet_secure_from_worker: network.SecurityRulesClient#CreateOrUpdate: Failure sending request: StatusCode=200 -- Original Error: Long running operation terminated with status 'Failed': Code="InternalServerError" Message="An error occurred."
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
make: *** [apply] Error 1 [sur] filed #1457 |
CI failure for AWS:
[sur] filed #1458 |
CI failure on Azure: Error applying plan:
1 error(s) occurred:
* module.vnet.azurerm_lb_probe.console-lb (destroy): 1 error(s) occurred:
* azurerm_lb_probe.console-lb: Error Creating/Updating LoadBalancer network.LoadBalancersClient#CreateOrUpdate: Failure responding to request: StatusCode=429 -- Original Error: autorest/azure: Service returned an error. Status=429 Code="RetryableError" Message="A retryable error occurred." Details=[{"code":"ReferencedResourceNotProvisioned","message":"Cannot proceed with operation because resource /subscriptions/****/resourceGroups/tectonic-cluster-example-pr-1389-801234567890/providers/Microsoft.Network/networkInterfaces/example-pr-1389-801234567890-master-0/ipConfigurations/example-pr-1389-801234567890-MasterIPConfiguration used by resource /subscriptions/****/resourceGroups/tectonic-cluster-example-pr-1389-801234567890/providers/Microsoft.Network/loadBalancers/example-pr-1389-801234567890-api-lb is not in Succeeded state. Resource is in Deleting state and the last operation that updated/is updating the resource is DeleteNicOperation."}]
[sur]: filed #1459 |
CI failure on Azure: Error applying plan:
2 error(s) occurred:
* module.vnet.azurerm_lb_probe.ssh-lb (destroy): 1 error(s) occurred:
* azurerm_lb_probe.ssh-lb: Error Creating/Updating LoadBalancer network.LoadBalancersClient#CreateOrUpdate: Failure responding to request: StatusCode=429 -- Original Error: autorest/azure: Service returned an error. Status=429 Code="RetryableError" Message="A retryable error occurred." Details=[{"code":"ReferencedResourceNotProvisioned","message":"Cannot proceed with operation because resource /subscriptions/****/resourceGroups/tectonic-cluster-exper-pr-1436-29012345678901/providers/Microsoft.Network/networkInterfaces/exper-pr-1436-29012345678901-master-0/ipConfigurations/exper-pr-1436-29012345678901-MasterIPConfiguration used by resource /subscriptions/****/resourceGroups/tectonic-cluster-exper-pr-1436-29012345678901/providers/Microsoft.Network/loadBalancers/exper-pr-1436-29012345678901-api-lb is not in Succeeded state. Resource is in Deleting state and the last operation that updated/is updating the resource is DeleteNicOperation."}]
* module.vnet.azurerm_lb_probe.api-lb (destroy): 1 error(s) occurred:
* azurerm_lb_probe.api-lb: Error Creating/Updating LoadBalancer network.LoadBalancersClient#CreateOrUpdate: Failure responding to request: StatusCode=429 -- Original Error: autorest/azure: Service returned an error. Status=429 Code="RetryableError" Message="A retryable error occurred." Details=[{"code":"ReferencedResourceNotProvisioned","message":"Cannot proceed with operation because resource /subscriptions/****/resourceGroups/tectonic-cluster-exper-pr-1436-29012345678901/providers/Microsoft.Network/networkInterfaces/exper-pr-1436-29012345678901-master-0/ipConfigurations/exper-pr-1436-29012345678901-MasterIPConfiguration used by resource /subscriptions/****/resourceGroups/tectonic-cluster-exper-pr-1436-29012345678901/providers/Microsoft.Network/loadBalancers/exper-pr-1436-29012345678901-api-lb is not in Succeeded state. Resource is in Deleting state and the last operation that updated/is updating the resource is DeleteNicOperation."}] [sur]: filed #1459 |
I am closing this umbrella issue in favor of dedicated issues marked as Please submit/comment on the existing issues or submit a new one using the |
The following failure modes have been seen lately:
#1052 (comment)
#1053 (comment)
#1053 (comment)
EDIT: added by Ed
Things we know we need to fix:
The text was updated successfully, but these errors were encountered: