Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAPA Target advertisement not updated during NSM connection issue #498

Open
zolug opened this issue Feb 5, 2024 · 0 comments
Open

TAPA Target advertisement not updated during NSM connection issue #498

zolug opened this issue Feb 5, 2024 · 0 comments
Assignees
Labels
component/TAPA kind/bug Something isn't working kind/enhancement New feature or request

Comments

@zolug
Copy link
Collaborator

zolug commented Feb 5, 2024

Describe the bug
TAPA has a service to announce application Targets with opened Stream(s) towards the nVIP cluster. The information is consumed by the LB component to provide a load balancing functionality towards available application Targets. Reliable operation of the load balancing feature requires data connectivity, which is supplied by NSM.

Currently, once the initial NSM connection request successfully connects the TAPA to a Proxy, it is considered safe to announce the application Target from data connectivity point of view. However, the NSM connection connecting TAPA to a Meridio Proxy might experience problems which might lead to traffic disturbance including outage.

These problems include for example restart/upgrade of NSM infrastructure components, restart/upgrade of the Meridio Proxy serving the application's TAPA sidecar. (But other non POD availability related infrastructure issues also belong here.)

It should be investigated if the current behaviour could be improved.

improvement idea:
NSM offers a monitoring feature that allows for learning NSM connection state changes. This monitoring "tool" could be used to update consumers of Target announcements such as the LB, so that Targets with possible connectivity problems could be excluded from loadbalancing.
Also, if NSM connection between TAPA and Proxy was utilizing a datapath monitoring functionality as part of NSM heal, other infrastructure related issues causing datapath connectivity problems could be also learnt through NSM connection monitoring. (That's because NSM heal would first close the non-working connection as part of the heal procedure, which would trigger a monitoring event.)

To Reproduce
Steps to reproduce the behavior:

  1. Deploy a working Trench (with Conduit, Attractor, Stream etc.). Deploy target-example that opens the Stream.
  2. Delete the vpp-forwarder POD located on the same worker as a target-example POD. The application Target belonging to the affected example-target POD will remain in available in NSP and thus in LBs.

Expected behavior
It should be avoided to announce Targets with non-working TAPA->Proxy NSM connections. Thus, LBs could exclude them from the pool of working Targets.

Context

  • Network Service Mesh: v1.12.0
  • Meridio: v1.0.16
    ...

Logs
Add logs here.

@zolug zolug added kind/bug Something isn't working kind/enhancement New feature or request component/TAPA labels Feb 5, 2024
@zolug zolug added this to Meridio Feb 5, 2024
@zolug zolug self-assigned this Feb 5, 2024
@LionelJouin LionelJouin moved this to 🏗 In progress in Meridio May 24, 2024
@zolug zolug moved this from 🏗 In progress to 📋 To Do in Meridio Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/TAPA kind/bug Something isn't working kind/enhancement New feature or request
Projects
Status: 📋 To Do
Development

No branches or pull requests

1 participant