-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HLD] SONiC GNMI Server Interface Design #996
Conversation
Signed-off-by: Gang Lv ganglv@microsoft.com
@qiluo-msft would you please review this PR? |
@anand-kumar-subramanian @kannankvs @mikelazar @kerry-meyer |
|
||
# 1 Feature Overview | ||
|
||
SONiC should provide a gNMI/gNOI server interface where the client can communicate with SONiC (server) through the interface using their respective gRPC libraries based on ConfigDB/ApplDB/StateDB/CountersDB schema or SONiC Yang schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you are opening up gNMI to write directly to APPDB (any DB) without any application - this may result in data consistency issues? for security concerns, someone bypasses some checks and directly operates the DB. what is the plan to mitigate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gNMI can only update table in ApplDB with Yang models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madhupalu The requirement comes from DASH project. The gNMI server will only write to limited tables in ApplDB, and these tables must have Yang model defined. In this use case, gNMI play the same role as a RestAPI server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
# 1 Feature Overview | ||
|
||
SONiC should provide a gNMI/gNOI server interface where the client can communicate with SONiC (server) through the interface using their respective gRPC libraries based on ConfigDB/ApplDB/StateDB/CountersDB schema or SONiC Yang schema. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will gNMI expose the capability info like which table can be changed, which one can not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no Yang models for a table, this table can't be changed.
And we can also follow RFC to use "config false" tag, please refer to link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good with it.
|
||
### 1.2.2 Container | ||
|
||
All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you are introducing new container - Why introduce a new container instead of reusing telemetry? Telemetry container has been used for config purposes by some applications today, can telemetry container be renamed and continue to be used? If a new container is required, what is the backward compatible/migration plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not have a consensus to ahead with the two containers approach one for config and another one for get and subscription in the community call 6/28, please revisit the design and explain why cant we combine all the gNMI operations in one container (today it is telemetry container but can be renamed as gNMI container), this helps to avoid code duplication.
@bhagatyj mentioned in the call today that there is a need for telemetry container restart for new paths addition, if you could explain why cant we do this on-the-fly without container restart, would be helpful to understand the rationale better.
@prvattem Please add if I have missed to capture anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the above comments, we should keep a single gnmi server in SONiC, maintaining two gnmi servers in the same NOS unnecessarily divide the code and future contributions. If there are enhancement or capability that needs to be addressed, it should be directed towards the current gnmi server. The gnmi server in telemetry container was introduced 2018, there are active PR contributions in github
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gnmi container (Telemetry) already exists today. Is there a reason why the existing gnmi server does not fit / cannot be extended for the intended functions ?
Introducing duplicate containers in the architecture comes with a lot of overhead the community will pay in terms of maintenance.
- The developer community will be divided on which one to contribute to. Does the developer need to ensure both work ? This is an overhead on multiple contributions.
- If it's not mandated that both should work, there will be wide divergence on what features will be supported through which container
- There will be code duplication. Why should the same NOS have two GNMI servers?
If there is something deficient in the existing container kindly list it clearly, and articulate why the existing container cannot be enhanced (if required) to meet the requirements.
All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container. | ||
|
||
# 2 Error Handling | ||
supervisord will autorestart gnmi service if it exits after it has successfully started up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the design handles do config apply errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, SONiC GNMI server is on par with SONiC CLI, it will update ConfigDB.
For incremental configuration, GNMI will invoke GCU (generic config updater), and GCU can take care of manual service restart and verify services have absorbed change correctly. And then GNMI can report any error detected by GCU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There needs to be an application to handle SAI errors for the configs from controller via gNMI server, what's the plan handles this use-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, gNMI can't handle SAI errors.
GCU has implemented a framework to take care of manual service restart and verify services have absorbed change correctly. So gNMI server with GCU has better error handling than SONiC CLI.
### 1.2.2 Container | ||
|
||
All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this feature controlled via build time flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly suggest you not embed .png files into specs, but instead use editable SVG files, which are compact; act as source; and are directly viewable in a browser or GitHub. PNGs are not editable, so it's not possible to maintain drawings down the road unless you store a separate source file. PNG files are larger as well as appear
blurry. We've switched over to using .svg almost exclusively in the https://github.com/Azure/DASH project. Check out https://www.diagrams.net/. BTW you can export Visio as SVG to do a one-time conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this feature controlled via build time flags?
Not now, any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a build time flag similar to other features ex: sonic_mgmt_framework
Hi Venkat.
Thank you for capturing the minutes. Please see inline..
On Tue, Jun 28, 2022 at 11:44 PM Venkatesan Mahalingam < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In doc/mgmt/gnmi/SONiC_GNMI_Server_Interface_Design.md
<#996 (comment)>:
> +For full configuration, there’re 2 possible solutions:
+* gNMI server needs to send response at first, and then invoke 'config reload'.
+
+<img src="images/full_rpc.png" alt="full-gnmi" width="500px"/>
+
+* Use gNMI request and gNOI request together to implement full configuration update.
+
+<img src="images/full_rpc_gnoi.png" alt="full-gnmi-gnoi" width="500px"/>
+
+The full configuration request will be overwritten by subsequent full configuration request or incremental configuration request.
+
+<img src="images/mixed requests.png" alt="overwritten-config" width="500px"/>
+
+### 1.2.2 Container
+
+All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container.
We did not have an consensus to ahead with the two containers approach one
for config and another one for get and subscription in the community call
6/28, please revisit the design and explain why cant we combine all the
gNMI operations in one container (today it is telemetry container but can
be renamed as gNMI container), this helps to avoid code duplication.
@bhagatyj <https://github.com/bhagatyj> mentioned in the call today that
there is a need for telemetry container restart for new paths addition, if
you could explain why cant we do this on-the-fly without container restart,
would be helpful to understand the rationale better.
I mentioned that I can relate with a need for a separate telemetry
container for new paths addition. If there are other methods to perform new
paths addition without disrupting the config/gNOI operations, that works
too. One disadvantage of separating the containers is that was mentioned in
the call was duplication of multiple files. While this could be partially
addressed by overlay functionality, this duplication is a concern that
needs to be thought about. In short, we still need to weigh the pros/cons
of telemetry container separation.
… @prvattem <https://github.com/prvattem> Please add if I have missed to
capture anything.
—
Reply to this email directly, view it on GitHub
<#996 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA22RORF35ZZI7SZET2ECLTVRPWD7ANCNFSM5VHSEYRQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
||
# 1 Feature Overview | ||
|
||
SONiC should provide a gNMI/gNOI server interface where the client can communicate with SONiC (server) through the interface using their respective gRPC libraries based on ConfigDB/ApplDB/StateDB/CountersDB schema or SONiC Yang schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the need of a requirement to write directly to APPL_DB / STATE_DB. This is against SONIC configuration model where any configuration is written into CONFIG_DB only. Such a bypassing is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed in the meeting. There are some usecases such as route add/deletes (for e.g vnet routes) which shall be programmed by an external controller. These routes are not intended to be persistent across reboots. It is essentially extending the existing RestAPI model to gNMI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prsunny We need to selectively allow (e.g VNET routes, DASH objects in APPL_DB) to control via REST/gNMI server, exposing DB objects without any restriction via north bound interfaces would be a big security concern. I like to see some design for this in this HLD or in a separate HLD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prsunny I don't think we can make infrastructure changes to allow specific need for a select-isolated deployment. Tomorrow someone might comeup with a need to directly write to ASIC_DB for effecting shutdown of all port-channels quickly. I don't think these things should be allowed. I don't agree that 'some external entity' wants to write to internal database tables of SONIC' is a requirements that needs to be fulfilled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Venkat that it shall be only selectively allowed. @bandaru-viswanath, i tend to disagree that this is infrastructure change. For your comment, if there is an external orchagent, may be it is a valid scenario of writing directly to ASIC_DB. But currently that's not the case. As mentioned, this is not a new design as APP_DB means application DB and depending on where the application sits (like BGP etc), it can act as the producer. Why should we restrict that application must be localized. Also this is not a persitent config. So config_db is out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Venkat that it shall be only selectively allowed
Please suggest how it can be selectively enabled, without hardcoding specific details in the container. The only place this can be done is in the GNMI container, and if we do so, we will be hardcoding some specific usecase/deployment requirements into UI.
i tend to disagree that this is infrastructure change.
IMO, it is an infra item to allow access to a DB otherwise considered below user level.
For your comment, if there is an external orchagent, may be it is a valid scenario of writing directly to ASIC_DB. But currently that's not the case. As mentioned, this is not a new design as APP_DB means application DB and depending on where the application sits (like BGP etc), it can act as the producer. Why should we restrict that application must be localized.
I am not arguing whether it is technically feasible. I see and know that it is quite possible the way you described. I wouldn't also be surprised if some non-community SONIC adaptations already do so. I am arguing that it is not the definition of SONiC in terms of how the databases are positioned in the architecture. So that sense, at least to me, its a deviation from the SONiC philosophy and warrants a bigger discussion in community.
Would community accept a PR to enhance this design to write something to ASIC_DB because it is their use case ?
Also this is not a persitent config. So config_db is out of scope.
Its should not be that it has to be APPL_DB because config-db is out of scope. That, at least to me, is not the right argument. It has to be APPL_DB, because that the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please suggest how it can be selectively enabled, without hardcoding specific details in the container. The only place this can be done is in the GNMI container, and if we do so, we will be hardcoding some specific usecase/deployment requirements into UI.
As suggested above, this can be done using Yang.
Yes, APPL_DB is the right place. Im just saying config_db is out of scope for programming something that is not intended to be persistent. I'm not sure on how positioning of APPL database in architecture warrants that only config_db can be modified for write operations. My point is APPL_DB is for application data (irrespective of whether it is local or remote). From an architecture point, it is intended to be written by various different applications (E.g PINS write to APP_DB directly, SDN controller programs Vnet routes directly to APP_DB directly via RestAPI). Its not a new design. Agree with you on the position of ASIC_DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a simple clarification: there is no requirement to write directly to STATE_DB, in this HLD, the gNMI server will read from StateDB, like a telemetry server.
* Set and get RPCs must be supported. Customer will use get RPC to retrieve configurations including FRR and VNET route, and use set RPC to apply new configurations. | ||
* SetRequest message must support to incrementally update configuration and fully update configurations. | ||
* Data models can be SONiC Yang models or CONFIG_DB schema. | ||
* Ability to configure/read different DBs - ApplDB, ConfigDB, StateDB, CountersDB etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comment above, writing directly to APPL_DB/STATE_DB/COUNTERS_DB is against the general norm for SONIC management model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rewrite this line to
- Ability to read different DBs - ApplDB, ConfigDB, StateDB, CountersDB etc.
- Ability to write different DBs - ApplDB, ConfigDB
Understanding that you still have concern on writing ApplDB. We can keep it in other discussion threads.
|
||
### 1.2.2 Container | ||
|
||
All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for a new GNMI server. There is already one GNMI container which runs the server. Please enhance the existing container instead of creating a new one.
| ---- | ---- | | ||
| replace {<br> path {<br>  elem {name: "CONFIG_DB"} elem {name: "DEVICE_NEIGHBOR"} elem {name: "Ethernet96"}<br> }<br>}<br>replace {<br> path {<br>  elem {name: "CONFIG_DB"} elem {name: "DEVICE_NEIGHBOR"} elem {name: "Ethernet8"} elem {name: "port"}<br> }<br> val { json_ietf_val: "eth1" }<br>} | [<br> {<br>  "op": "remove", "path": "/DEVICE_NEIGHBOR/Ethernet96"}, <br>  { "op": "add", "path": "/DEVICE_NEIGHBOR/Ethernet8/port", "value": "eth1"}<br>] | | ||
|
||
#### 1.2.1.5 Full Configurations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hows the simultaneous configuration from CLI and GNMI protected/serialized ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree. Simultaneous configuration from CLI and GNMI should be protected.
We have plan to update CLI implementation, and it's not covered in this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, please enhance the HLD to add appropriate protection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bandaru-viswanath this is a great question! Currently the HLD of gNMI server does not protect against confliction with CLI, similar to one CLI does not protect against confliction with another CLI today.
In the future vision, there should be a centralized service inside SONiC responsible for all kinds of configuration. The requests could come from CLI/gNMI/RestAPI or SONiC applications. And the request should be serialized if there is confliction. The gNMI server or a sub service under gNMI server may be a candidate for such centralized service.
### 1.1.1 Phase 1 Requirements | ||
* Set and get RPCs must be supported. Customer will use get RPC to retrieve configurations including FRR and VNET route, and use set RPC to apply new configurations. | ||
* SetRequest message must support to incrementally update configuration and fully update configurations. | ||
* Data models can be SONiC Yang models or CONFIG_DB schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current SONIC-Yang models are flat and are not meant for Northbound configurations. How are the dependencies resolved in flat-models ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a little bit more on "SONIC-Yang models are flat and are not meant for Northbound configurations"? What is "flat"?
This gNMI server depends on "Generic Config Updater" design, which could figure out the dependencies from SONiC Yang models. ref: https://github.com/sonic-net/SONiC/blob/master/doc/config-generic-update-rollback/SONiC_Generic_Config_Update_and_Rollback_Design.md#stage-2-json-patch-ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiluo-msft Please see the hierarchical yang models here. https://www.openconfig.net/projects/models/ . It shows how data is defined in a hierarchy and also cross referenced. They are useful as NB yang models because of the hierarchy and referencing.
SONIC yang models are flat, and are really meant only for validating config data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
For your original question "How are the dependencies resolved in flat-models ?" SONiC Yang models use must
to express the dependencies, for example https://github.com/Azure/sonic-buildimage/blob/a4b9838231b50018193d7ac2927067f0a03b8582/src/sonic-yang-models/yang-models/sonic-vlan.yang#L118. And it could be used for configuration.
@madhupalu @venkatmahalingam @adyeung @bandaru-viswanath @bhagatyj SONiC telemetry is using prefix target to identify target database, and we can use the same method to support backward compatibility.
Now we have two gNMI server, one for read and one for configure. When SONiC telemetry and SONiC gNMI are stable in the future, we can use this prefix target solution to merge them to a single container. |
2. User could use SONiC CLI command "config apply-patch" to incrementally update configurations systematically and validate target configuration using SONiC Yang models. With generic_config_updater, all operations in a 'config apply-patch' command are processed in a single transaction that will either succeed or fail as a whole. | ||
- "config apply-patch" supports ConfigDB, and Yang models. | ||
- The green arrow indicates this process, and dotted arrow indicates Yang validation. | ||
3. SONiC CLI commands can use "config reload" to fully update configurations and validate target configuration using SONiC Yang models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ganglyu can this PR be merged? |
we are still waiting for community review |
* In the authentication of client certificate, after the certificate chain is validated by TLS, it will further check if the common name of the end-entity certificate is in the trusted common name list of the server config. | ||
|
||
#### 1.2.1.9 ACL | ||
GNMI server will not support OpenConfig Yang models, so ACL configuration will use CONFIG_DB schema and SONiC Yang Models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GNMI server already supports OpenConfig YANG models via Management framework. This statement is not right and needs to be removed.
* Data models can be SONiC Yang models or CONFIG_DB schema. | ||
* Ability to read different DBs - ApplDB, ConfigDB, StateDB, CountersDB etc. | ||
* Ability to write different DBs - ApplDB, ConfigDB. | ||
* Configurations must be verified using YANG models even it is in CONFIG_DB schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we planning to achieve this? Are we planning to integrate with the python validator(In which case validation of the bulking configurations will become time-consuming due to loading and unloading of pythong files)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CONFIG_DB, we can accept time-consuming configurations. Validation is more important to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation can be done by other means(go lang based validator that will not be time consuming) why integrate with python validator knowing that it is going to be time-consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use generic config updater to support incremental configuration update.
For now, generic config updater is implemented with Python and it's time consuming, and we can improve generic config updater in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anand-kumar-subramanian Could you provide some sample of "bulking configurations will become time-consuming" in your mind. If the "bulking configurations" is in one SetRequest, the "loading and unloading of python files" will happen only once and it will not be a significant performance issue. If "bulking configurations" is in multiple SetRequest, it is better to ask client to combine them into one.
1d66dc2
@adyeung @bandaru-viswanath @bhagatyj Could you review again? We believe already resolved all your comments. |
/easycla |
// For none-DB data | ||
OTHERS = 100; | ||
// For mixed schema | ||
MIXED_SCHEMA = 101; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiluo-msft can you please double check if this feature has been included in 202211 branch? Thanks. |
Yes, this feature has been included in 202211 branch. |
Signed-off-by: Gang Lv ganglv@microsoft.com
PR: