Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heal tests are unstable on AWS #345

Closed
denis-tingaikin opened this issue Mar 27, 2023 · 2 comments
Closed

Heal tests are unstable on AWS #345

denis-tingaikin opened this issue Mar 27, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Mar 27, 2023

Motivation

--- FAIL: TestRunHealSuite (2684.48s)
    --- PASS: TestRunHealSuite/TestDataplane_interrupt (80.01s)
    --- PASS: TestRunHealSuite/TestLocal_forwarder_death (79.59s)
    --- PASS: TestRunHealSuite/TestLocal_forwarder_remote_forwarder (89.67s)
    --- PASS: TestRunHealSuite/TestLocal_nse_death (78.29s)
    --- PASS: TestRunHealSuite/TestLocal_nsm_system_restart (123.97s)
    --- PASS: TestRunHealSuite/TestLocal_nsmgr_local_forwarder_memif (67.22s)
    --- PASS: TestRunHealSuite/TestLocal_nsmgr_local_nse_memif (69.16s)
    --- PASS: TestRunHealSuite/TestLocal_nsmgr_remote_nsmgr (78.89s)
    --- PASS: TestRunHealSuite/TestLocal_nsmgr_restart (80.52s)
    --- PASS: TestRunHealSuite/TestRegistry_local_endpoint (84.37s)
    --- FAIL: TestRunHealSuite/TestRegistry_remote_forwarder (151.69s)
    --- PASS: TestRunHealSuite/TestRegistry_remote_nsmgr (83.09s)
    --- PASS: TestRunHealSuite/TestRegistry_restart (85.86s)
    --- FAIL: TestRunHealSuite/TestRemote_forwarder_death (142.61s)
    --- FAIL: TestRunHealSuite/TestRemote_forwarder_death_ip (140.86s)
    --- PASS: TestRunHealSuite/TestRemote_nse_death (86.40s)
    --- FAIL: TestRunHealSuite/TestRemote_nse_death_ip (136.29s)
    --- PASS: TestRunHealSuite/TestRemote_nsm_system_restart_memif_ip (98.10s)
    --- PASS: TestRunHealSuite/TestRemote_nsmgr_death (96.04s)
    --- PASS: TestRunHealSuite/TestRemote_nsmgr_remote_endpoint (76.31s)
    --- PASS: TestRunHealSuite/TestRemote_nsmgr_restart (77.88s)
    --- PASS: TestRunHealSuite/TestRemote_nsmgr_restart_ip (80.05s)
    --- PASS: TestRunHealSuite/TestSpire_agent_restart (61.91s)
    --- PASS: TestRunHealSuite/TestSpire_server_agent_restart (65.54s)
    --- PASS: TestRunHealSuite/TestSpire_server_restart (54.99s)
    --- PASS: TestRunHealSuite/TestSpire_upgrade (97.26s)
    --- PASS: TestRunHealSuite/TestVl3_nscs_death (113.04s)
    --- PASS: TestRunHealSuite/TestVl3_nse_death (74.36s)

Logs

aws-logs-559-ipv4.zip

Build

https://github.com/networkservicemesh/integration-k8s-aws/actions/runs/4481325103/jobs/7877863815

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Mar 27, 2023
@glazychev-art glazychev-art moved this to In Progress in Release v1.9.0 Mar 30, 2023
@NikitaSkrynnik
Copy link
Contributor

NikitaSkrynnik commented Mar 30, 2023

Problem

Clients in all failed tests tried to connect to nonexistent Network Services. For example, TestRegistry_remote_forwarder tried to connect to Network Service local_nsmgr_restart. Apparently, it happens when clients in some tests fail to close the connection and it is cached in monitor service. When the next test starts the client gets the connection from the previous test and fails to connect to Network Service because it doesn't exist in the current test. Client can get the unclosed connection from the previous tests because all connections has the same id alpine-0.

Solution

We can construct connection id from pod's name and pod's UID which is stored in pod's metadata. This id will be the same for cmd-nsc-init and cmd-nsc because they are in the same pod.

For internal clients we can just randomly generate an id for a connection, because we don't have cmd-nsc-init in an internal client.

@edwarnicke, what do you think about this solution?

@NikitaSkrynnik
Copy link
Contributor

Another Solution

We can generate UID for connection IDs in webhook and inject it as env to both cmd-nsc-init and cmd-nsc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

3 participants