TAPA Target advertisement not updated during NSM connection issue #498

zolug · 2024-02-05T10:36:39Z

Describe the bug
TAPA has a service to announce application Targets with opened Stream(s) towards the nVIP cluster. The information is consumed by the LB component to provide a load balancing functionality towards available application Targets. Reliable operation of the load balancing feature requires data connectivity, which is supplied by NSM.

Currently, once the initial NSM connection request successfully connects the TAPA to a Proxy, it is considered safe to announce the application Target from data connectivity point of view. However, the NSM connection connecting TAPA to a Meridio Proxy might experience problems which might lead to traffic disturbance including outage.

These problems include for example restart/upgrade of NSM infrastructure components, restart/upgrade of the Meridio Proxy serving the application's TAPA sidecar. (But other non POD availability related infrastructure issues also belong here.)

It should be investigated if the current behaviour could be improved.

improvement idea:
NSM offers a monitoring feature that allows for learning NSM connection state changes. This monitoring "tool" could be used to update consumers of Target announcements such as the LB, so that Targets with possible connectivity problems could be excluded from loadbalancing.
Also, if NSM connection between TAPA and Proxy was utilizing a datapath monitoring functionality as part of NSM heal, other infrastructure related issues causing datapath connectivity problems could be also learnt through NSM connection monitoring. (That's because NSM heal would first close the non-working connection as part of the heal procedure, which would trigger a monitoring event.)

To Reproduce
Steps to reproduce the behavior:

Deploy a working Trench (with Conduit, Attractor, Stream etc.). Deploy target-example that opens the Stream.
Delete the vpp-forwarder POD located on the same worker as a target-example POD. The application Target belonging to the affected example-target POD will remain in available in NSP and thus in LBs.

Expected behavior
It should be avoided to announce Targets with non-working TAPA->Proxy NSM connections. Thus, LBs could exclude them from the pool of working Targets.

Context

Network Service Mesh: v1.12.0
Meridio: v1.0.16
...

Logs
Add logs here.

zolug added kind/bug Something isn't working kind/enhancement New feature or request component/TAPA labels Feb 5, 2024

zolug added this to Meridio Feb 5, 2024

zolug self-assigned this Feb 5, 2024

LionelJouin moved this to 🏗 In progress in Meridio May 24, 2024

zolug moved this from 🏗 In progress to 📋 To Do in Meridio Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TAPA Target advertisement not updated during NSM connection issue #498

TAPA Target advertisement not updated during NSM connection issue #498

zolug commented Feb 5, 2024

TAPA Target advertisement not updated during NSM connection issue #498

TAPA Target advertisement not updated during NSM connection issue #498

Comments

zolug commented Feb 5, 2024