Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Events producer #954

Merged
merged 153 commits into from
Nov 11, 2022
Merged

Conversation

renukamanavalan
Copy link
Contributor

@renukamanavalan renukamanavalan commented Mar 1, 2022

SONiC to stream events occurring in SONiC switch (e.g BGP flap, process not running, ...) in real time via gNMI subscription model.
The events are defined using versioned YANG schema.
For more details, please refer the doc.

Repo PR title State
sonic-buildimage Event libswsscommon deps GitHub issue/pull request detail
sonic-buildimage Event libswsscommon deps GitHub issue/pull request detail
sonic-buildimage Streaming structured events implementation GitHub issue/pull request detail
sonic-buildimage Fix PR build failure GitHub issue/pull request detail
sonic-buildimage Add events to host and create rsyslog_plugin deb pkg GitHub issue/pull request detail
sonic-buildimage Add Structured Events w/ YANG Models GitHub issue/pull request detail
sonic-buildimage Add Yang Models for structured events GitHub issue/pull request detail
sonic-buildimage Add YANG model and unit tests for additional structured events GitHub issue/pull request detail
sonic-buildimage Publish additional events GitHub issue/pull request detail
sonic-buildimage Add rsyslog plugin regex for select operation failure GitHub issue/pull request detail
sonic-buildimage Add YANG model for alpm parity error GitHub issue/pull request detail
sonic-swss Publish identified events via structured-events channel GitHub issue/pull request detail
sonic-gnmi Telemetry support for streaming events GitHub issue/pull request detail
sonic-gnmi Events client: Ensure all go routines exit upon client disconnect. GitHub issue/pull request detail
sonic-gnmi Structured events: Publish as JsonIetfVal instead of StringVal GitHub issue/pull request detail
sonic-gnmi gnmi_cli - Tool update GitHub issue/pull request detail
sonic-gnmi Streaming events URL support "not to use cache" GitHub issue/pull request detail
sonic-utilities Event Counters CLI GitHub issue/pull request detail
sonic-swss-common APIs to support streaming structured events GitHub issue/pull request detail
sonic-swss-common Events: APIs to set/get global options GitHub issue/pull request detail

@renukamanavalan renukamanavalan self-assigned this Mar 1, 2022
@venkatmahalingam
Copy link
Collaborator

venkatmahalingam commented Jun 17, 2022

I was thinking about 2 approaches in the meeting today. @renukamanavalan @bandaru-viswanath, @dgsudharsan @praveen-li we can discuss the pros and cons in the next meeting.

1) Introduce event specific container and reference to the list used for the config-DB table.
For example, in the port table YANG file, we can do the following,

https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-port.yang

       container PORT {    -> This is 1:1 representation of config-DB schema
		description "PORT part of config_db.json";

		list PORT_LIST {
                       key "name";
                       leaf name {
				type string {
					length 1..128;
				}
	           } 
                    ..............
                    }
        }
        **container PORT_STATE {  -> This container can be used for events, if PORT_STATE table name is not good, we can use 
                                                     better name.
            config false;
             list PORT_LIST {
                 key "name";
               leaf name {
                             type leafref {
                                  path "../../../PORT/PORT_LIST/name;
			}
               leaf oper_status { .. }
       }**

}

2) Enhance the container e.g PORT defined already for the config-DB schema to include state information e.g port_state_info as well.

grouping port_state_info {
<<<<< read-only attributes>>>>>>
leaf oper_status { .. }
}
container PORT {
description "PORT part of config_db.json";

		list PORT_LIST {
                       key "name";
                       leaf name {
				type string {
					length 1..128;
				}
	           } 
                    ..............
                      uses port_state_info;
                    }
        }

@renukamanavalan
Copy link
Contributor Author

renukamanavalan commented Jun 21, 2022

I was thinking about 2 approaches in the meeting today. @renukamanavalan @bandaru-viswanath, @dgsudharsan @praveen-li we can discuss the pros and cons in the next meeting.

1) Introduce event specific container and reference to the list used for the config-DB table. For example, in the port table YANG file, we can do the following,

https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-port.yang

       container PORT {    -> This is 1:1 representation of config-DB schema
		description "PORT part of config_db.json";

		list PORT_LIST {
                       key "name";
                       leaf name {
				type string {
					length 1..128;
				}
	           } 
                    ..............
                    }
        }
        **container PORT_STATE {  -> This container can be used for events, if PORT_STATE table name is not good, we can use 
                                                     better name.
            config false;
             list PORT_LIST {
                 key "name";
               leaf name {
                             type leafref {
                                  path "../../../PORT/PORT_LIST/name;
			}
               leaf oper_status { .. }
       }**

}

2) Enhance the container e.g PORT defined already for the config-DB schema to include state information e.g port_state_info as well.

grouping port_state_info { <<<<< read-only attributes>>>>>> leaf oper_status { .. } } container PORT { description "PORT part of config_db.json";

		list PORT_LIST {
                       key "name";
                       leaf name {
				type string {
					length 1..128;
				}
	           } 
                    ..............
                      uses port_state_info;
                    }
        }

Thanks Venkat!
Here is a 3rd option.

module sonic-events-swss {
…
        container if-state { 
               leaf ifname {
                     type string;
                     description "Interface name";
               }
               type leafref {
                      path "/port:sonic-port/port:PORT/port:PORT_LIST/port:name";
               }

The following are the reasons behind.

  1. The consumer of these events are not running inside the switch, but external and have no access to config.
  2. The state reported by these events are not saved inside the switch and hence never accessed from within the switch.
  3. As events are from switches running multiple different versions, we put a restriction that YANG events module should not break backward compatibility, but config schema does not have such restriction and no need either.
  4. Multiple different events reported an object may not fit logically in the config/state YANG model for the corresponding object.
    For example, PFC-storm & dhcp-relay events on an i/f does not fit in the YANG model of interface object.
  5. The consumer are looking for events across all switches for correlating & reporting and count of switches is often few tens of thousands in numbers. Also, the config among the switches could be using multiple different YANG versions, potentially.

@renukamanavalan
Copy link
Contributor Author

@venkatmahalingam, can we conclude on this discussion above?

@venkatmahalingam
Copy link
Collaborator

@venkatmahalingam, can we conclude on this discussion above?

A gNMI client could subscribe for events with optional filter on event source in streaming mode. Below shows the command & o/p for subscribing all, and receiving BGP events.

gnmic --target events --path "/events/" --mode STREAM --stream-mode ON_CHANGE

The instance data would indicate YANG module path & revision that is required for validation.

o/p
{
"sonic-events-bgp:bgp-state": {
"timestamp": "2022-08-17T02:39:21.286611Z",
"ip": "100.126.188.90",
"status": "down"
}
}

Instead of doing the above generic events subscription, if we can follow either approach 1 or 2, it will be cleaner to subscribe particular table state information, right? in the future, we can enhance the subscription for config changes as well.

I agree that if it's just an event, we dont need to store it in the DB, we can use another event YANG but what we use is the state information (i.e interface admin/oper status, BGP session status..etc), also, we should support get operation as well on the state information, right?

@renukamanavalan
Copy link
Contributor Author

ta would indica

  1. An explicit decision is made to support only one gNMI client - Please refer the section "Event exporting".
    At the rate of 10k events/second, with offline cache support & missed count tracking, supporting multiple can be expensive.
    On the other hand, looking at the consumer requirements/design model, a client receiving at 10K/sec should be light weighted and end consumers are often looking at events across thousands of switches. Hence the receiving client, which is 1:1 with SONiC switch is expected to dump the events into an external storage and the events consumers query/watch for updates at the storage. Often the consumers, need to collect info from multiple connected switches to make a call.

Hence gNMI client can only subscribe for /events which is all.

The "state" is not the only event reported on an i/f, e.g. PFC-Storm. With the absolute path provided as leafref, helps consumer identify object where needed.

@venkatmahalingam
Copy link
Collaborator

venkatmahalingam commented Jul 1, 2022

ta would indica

  1. An explicit decision is made to support only one gNMI client - Please refer the section "Event exporting".
    At the rate of 10k events/second, with offline cache support & missed count tracking, supporting multiple can be expensive.
    On the other hand, looking at the consumer requirements/design model, a client receiving at 10K/sec should be light weighted and end consumers are often looking at events across thousands of switches. Hence the receiving client, which is 1:1 with SONiC switch is expected to dump the events into an external storage and the events consumers query/watch for updates at the storage. Often the consumers, need to collect info from multiple connected switches to make a call.

Hence gNMI client can only subscribe for /events which is all.

The "state" is not the only event reported on an i/f, e.g. PFC-Storm. With the absolute path provided as leafref, helps consumer identify object where needed.

I understand that state is not the only event being reported, I think, we can report non-state information (absolutely doesnt make sense to have it in DB) in event YANG model.

What's the design to handle both the cases 1) interface oper. status changes (stored in the STATE_DB) 2) PFC storm (not in DB)?

For the first case, enhance the existing YANG to have state information as well and for the second case, introduce new event specific YANGs and leafref to existing tables if applicable.

@renukamanavalan @bandaru-viswanath, @dgsudharsan any concerns on the above design?

@renukamanavalan
Copy link
Contributor Author

ta would indica

  1. An explicit decision is made to support only one gNMI client - Please refer the section "Event exporting".
    At the rate of 10k events/second, with offline cache support & missed count tracking, supporting multiple can be expensive.
    On the other hand, looking at the consumer requirements/design model, a client receiving at 10K/sec should be light weighted and end consumers are often looking at events across thousands of switches. Hence the receiving client, which is 1:1 with SONiC switch is expected to dump the events into an external storage and the events consumers query/watch for updates at the storage. Often the consumers, need to collect info from multiple connected switches to make a call.

Hence gNMI client can only subscribe for /events which is all.
The "state" is not the only event reported on an i/f, e.g. PFC-Storm. With the absolute path provided as leafref, helps consumer identify object where needed.

I understand that state is not the only event being reported, I think, we can report non-state information (absolutely doesnt make sense to have it in DB) in event YANG model.

What's the design to handle both the cases 1) interface oper. status changes (stored in the STATE_DB) 2) PFC storm (not in DB)?

For the first case, enhance the existing YANG to have state information as well and for the second case, introduce new event specific YANGs and leafref to existing tables if applicable.

@renukamanavalan @bandaru-viswanath, @dgsudharsan any concerns on the above design?

We persist neither in DB.

@bandaru-viswanath
Copy link

ta would indica

  1. An explicit decision is made to support only one gNMI client - Please refer the section "Event exporting".
    At the rate of 10k events/second, with offline cache support & missed count tracking, supporting multiple can be expensive.
    On the other hand, looking at the consumer requirements/design model, a client receiving at 10K/sec should be light weighted and end consumers are often looking at events across thousands of switches. Hence the receiving client, which is 1:1 with SONiC switch is expected to dump the events into an external storage and the events consumers query/watch for updates at the storage. Often the consumers, need to collect info from multiple connected switches to make a call.

Hence gNMI client can only subscribe for /events which is all.
The "state" is not the only event reported on an i/f, e.g. PFC-Storm. With the absolute path provided as leafref, helps consumer identify object where needed.

I understand that state is not the only event being reported, I think, we can report non-state information (absolutely doesnt make sense to have it in DB) in event YANG model.

What's the design to handle both the cases 1) interface oper. status changes (stored in the STATE_DB) 2) PFC storm (not in DB)?

For the first case, enhance the existing YANG to have state information as well and for the second case, introduce new event specific YANGs and leafref to existing tables if applicable.

@renukamanavalan @bandaru-viswanath, @dgsudharsan any concerns on the above design?

Looks good to me @venkatmahalingam

@zhangyanzhao
Copy link
Collaborator

@renukamanavalan can you please add the code PRs by referring to #806? Thanks

@renukamanavalan renukamanavalan merged commit 42a9067 into sonic-net:master Nov 11, 2022
@praveen-li
Copy link
Member

praveen-li commented Nov 14, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants