Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event/alarm framework HLD update. Integrate with event producer framework. #1409

Merged
merged 5 commits into from
Nov 8, 2023

Conversation

bhaveshdell
Copy link
Contributor

@bhaveshdell bhaveshdell commented Jun 30, 2023

The event/alarm framework HLD was reviewed and committed earlier.
With the availability of event producer framework, the design is updated to provide DB/persistent support for event history.
Also gnmi/rest openconfig interface for get operations and gnmi subscription.

Code PRs
sonic-net/sonic-swss-common#852
sonic-net/sonic-buildimage#17949

@bhaveshdell bhaveshdell marked this pull request as ready for review July 13, 2023 20:51
@bhaveshdell
Copy link
Contributor Author

@zhangyanzhao Please assign reviewers.

@bhaveshdell
Copy link
Contributor Author

@kwangsuk, Please review.

@bhaveshdell
Copy link
Contributor Author

@renukamanavalan Please review update to the existing event framework HLD.
It is about integrating with the producer framework, to provide DB support.

@bhaveshdell bhaveshdell changed the title HLD Update. Integrate with event producer framework. Event/alarm framework HLD update. Integrate with event producer framework. Aug 22, 2023
} else {
LOG_EVENT(TEMPERATURE_EXCEEDED, sensor_name_p, CLEAR_ALARM, "Temperature for the sensor %s is %d degrees ", sensor_name_p, current_temp);
}
```
#### 3.1.1.2 Development Process

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is introducing differences in the way that an application/component needs to implement their events and alarms.
If this is the case, we should clearly outline what these changes are, ideally prepare an example, and raise this in the Community during one of the Tuesday calls.
If you agree, we can work with @zhangyanzhao to schedule something.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

@jeff-yin
Copy link

jeff-yin commented Sep 7, 2023

The event/alarm framework HLD was reviewed and committed earlier. With the availability of event producer framework, the design is updated to provide DB/persistent support for event history. Also gnmi/rest openconfig interface for get operations and gnmi subscription.

To add some more background, the original event/alarm framework HLD was raised as #1064 and then approved and merged.
That HLD also had code PRs attached, but the code was never reviewed and merged.

#954 for Event Producer was merged and its code PRs were merged, so the events/alarm framework HLD needed to be updated, hence this PR.

@adyeung
Copy link
Collaborator

adyeung commented Sep 28, 2023

@rajendra-dendukuri pls help review the HLD

@@ -175,14 +167,11 @@ This modified file can then be uploaded to the device to /etc/evprofile/.
Operator can select any of these custom event profiles to change default properties of events.
Copy link

@kwangsuk kwangsuk Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event profiling to change properties lie severity etc. would be managed by controllers as per their OAM policy. Hence, can we remove the support to change the profile settings?

@zhangyanzhao
Copy link
Collaborator

Reviewed in community on 9/21/2023

@zhangyanzhao
Copy link
Collaborator

@jeff-yin jeff-yin self-requested a review October 17, 2023 23:27
@jeff-yin
Copy link

jeff-yin commented Oct 18, 2023

@zhangyanzhao @renukamanavalan can we merge this HLD now?

@@ -102,7 +94,7 @@ Such a change has an important metric called *severity* to indicate how critical
The set of alarms and their severities are an indication to health of various applications of the system and System LED can be deduced from alarms.
An acknowledged alarm means that operator is aware of the condition so, acknowledged alarm will be taken out of consideration.

Both events and alarms get recorded in a new DB called EVENT DB in a new redis instance.
Both events and alarms get recorded in redis DB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which redisDB? could you update if its some existing redisDB or a new one?

Also, as of now, when applications (processes/daemons) publish their event(s)/alarm(s), they do NOT get stored in any redis (or persistent DB) - right? In that case, are they stored by eventD framework/container in memory? no EVENT DB as of now?

| 5.2.4 | Openconfig interface to acknowledge an alarm. | |
| 6 | CLI commands | |
| 6.1 | show alarm [ detail \| summary \| severity \| timestamp <from> <to> \| recent <5min\|1hr\|1day> \| sequence-number <from> <to> \| all] | |
| 6.2 | show event [ detail \| summary \| severity \| timestamp <from> <to> \| recent <5min\|1hr\|1day> \| sequence-number <from> <to>] | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 202305 present codebase, do not see support for these CLIs (i.e. show alarm, show event).
[1] In their absence, how does one check eventD's cache-services contents? as to what events, alarms are received and stored in eventD's cache-services?
Is it there as part of some changeset not merged yet? or not there at all?

#1064 has some unmerged PRs. Does this support reside there?
If so, what's the plan to commit, merge these pending PRs to sonic mainstream repositories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The present codebase does not have CLI support. It supports event publish via telemetry streaming.
The implementation for doc/event-alarm-framework/event-alarm-framework.md, is not merged.
#1064 unmerged PRs have the support for CLIs.
The current plan is to first get this PR merged. @shyam77git Please review and approve this PR.
Then rework changes in #1064 PRs to support these design updates and submit to community for coming release.

@zhangyanzhao
Copy link
Collaborator

merge this with reviewers approval

@zhangyanzhao zhangyanzhao merged commit 6f86d7a into sonic-net:master Nov 8, 2023
@zhangyanzhao
Copy link
Collaborator

@bhaveshdell can you please help to update the code PRs for this HLD by referring to #806 ? So that we can track this feature completeness clearly. Thanks.

@stephenxs
Copy link
Collaborator

Hi @bhaveshdell
Looks like the corresponding PRs are still tracked in a closed PR #761 which is confusing.
Could you please track the code PRs in this HLD which is latest?
Thanks

@venkatmahalingam
Copy link
Collaborator

Hi @bhaveshdell Looks like the corresponding PRs are still tracked in a closed PR #761 which is confusing. Could you please track the code PRs in this HLD which is latest? Thanks

@stephenxs Yes, we'll add code PRs to this HLD.

@shyam77git
Copy link
Contributor

Hi @bhaveshdell Looks like the corresponding PRs are still tracked in a closed PR #761 which is confusing. Could you please track the code PRs in this HLD which is latest? Thanks

@stephenxs Yes, we'll add code PRs to this HLD.

Thanks @venkatmahalingam .
That would help reviewers as to what all code PRs to be reviewed in context of this HLD.
Also, would it there be some new code PRs related to updates in this HLD?
@bhaveshdell - it would be good if you can provide a full view here w.r.t code PRs (existing, new/upcoming etc.)

@shyam77git
Copy link
Contributor

@bhaveshdell
Based on last comment, could you please update the latest and further plan on this?
Please add code PRs list to this HLD for review

@zhangyanzhao
Copy link
Collaborator

@zhangyanzhao
Copy link
Collaborator

The event/alarm framework HLD was reviewed and committed earlier. With the availability of event producer framework, the design is updated to provide DB/persistent support for event history. Also gnmi/rest openconfig interface for get operations and gnmi subscription. Code PRs copied from

sonic-net/sonic-buildimage#7813 sonic-net/sonic-swss-common#490 sonic-net/sonic-mgmt-framework#85 sonic-net/sonic-mgmt-common#48

@venkatmahalingam @bhaveshdell can you please check if above code PRs are accurate or others should be added? I copied them from #761. Code PRs should be included in HLD per community agreement. Thanks.

qiluo-msft pushed a commit to sonic-net/sonic-swss-common that referenced this pull request May 11, 2024
Why I did it
This PR contains code changes for providing extension to the Event Framework as specified in the sonic-net/SONiC#1409

How I did it
Followed design specified in sonic-net/SONiC#1409.
Add new tables for events and alarms.
@zhangyanzhao
Copy link
Collaborator

one code PR is still open, move to backlog

arfeigin pushed a commit to arfeigin/sonic-swss-common that referenced this pull request Jun 27, 2024
Why I did it
This PR contains code changes for providing extension to the Event Framework as specified in the sonic-net/SONiC#1409

How I did it
Followed design specified in sonic-net/SONiC#1409.
Add new tables for events and alarms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: MovedToBacklog
Development

Successfully merging this pull request may close these issues.

9 participants