Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended port operational status with various error and fault status #2060

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

prgeor
Copy link
Contributor

@prgeor prgeor commented Aug 3, 2024

Port status change notification now adds additional field to indicate various port error/fault status that caused the port operational status to go down. A capability attribute is added to query the Switch's capability that can be used to know if the new fault/error status are supported in the port status change notification.

sai_port_error_status_t bitmap can indicate more than one port error conditions by setting the corresponding error bit indicating why the port went down. Ideally, when the port operational status goes UP, the bitmap value should be ZERO i.e no errors.

Adding the error status as part of the port link status change notification allows better correlation of port operational status with various port error/fault conditions events that may have caused the port down event.

@prgeor prgeor force-pushed the mac-fault branch 2 times, most recently from 83ff60a to 2f120bb Compare August 3, 2024 06:15
inc/saiport.h Outdated
* @type bool
* @flags READ_ONLY
*/
SAI_PORT_ATTR_MAC_LOCAL_FAULT_STATUS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it better to add this as PORT_ATTR_FAULT_STATUS and then define enum such as mac_local_faluse, remote fault, and in the future could be other types of fault.

@prgeor prgeor changed the title Add port attribute to get MAC local and remote fault status Extended port operational status with various error and fault status Aug 6, 2024
@JaiOCP
Copy link
Contributor

JaiOCP commented Aug 12, 2024

@prgeor @lguohan
We discussed this topic offline with other vendors. There is a issue with having a separate port error notification as the callback can reach in any order along with port oper status. Besides this there is also the issue of when the error state is cleared.

#2048

Ideally we should have the port errors as part of the existing oper_status_t port notification, is the consensus.
Please look at the private branch (PR is not yet opened)

typedef enum _sai_port_oper_status_t
{
/** Unknown */
SAI_PORT_OPER_STATUS_UNKNOWN,

/** Up */
SAI_PORT_OPER_STATUS_UP,

/** Down */
SAI_PORT_OPER_STATUS_DOWN,

/** Test Running */
SAI_PORT_OPER_STATUS_TESTING,

/** Not Present */
SAI_PORT_OPER_STATUS_NOT_PRESENT,

/** Remote-fault observed on the port */
SAI_PORT_OPER_STATUS_REMOTE_FAULT,

/** Local-fault for example loss of signal and more */
SAI_PORT_OPER_STATUS_LOCAL_FAULT,

/** Pre-emphasis set failed when configuration is done*/
SAI_PORT_OPER_STATUS_PREEMPHASIS_SET_FAILED,

/** FEC set failed when configuration is done */
SAI_PORT_OPER_STATUS_FEC_SET_FAILED,

/** Speed set failed when configuration is done */
SAI_PORT_OPER_STATUS_SPEED_SET_FAILED,

/** Interface type set failed when configuration is done */
SAI_PORT_OPER_STATUS_IF_TYPE_SET_FAILED,

/** Media type set failed when configuration is done */
SAI_PORT_OPER_STATUS_MEDIA_TYPE_SET_FAILED,

....

@prgeor
Copy link
Contributor Author

prgeor commented Aug 19, 2024

@prgeor @lguohan We discussed this topic offline with other vendors. There is a issue with having a separate port error notification as the callback can reach in any order along with port oper status. Besides this there is also the issue of when the error state is cleared.

#2048

Ideally we should have the port errors as part of the existing oper_status_t port notification, is the consensus. Please look at the private branch (PR is not yet opened)

typedef enum _sai_port_oper_status_t { /** Unknown */ SAI_PORT_OPER_STATUS_UNKNOWN,

/** Up */
SAI_PORT_OPER_STATUS_UP,

/** Down */
SAI_PORT_OPER_STATUS_DOWN,

/** Test Running */
SAI_PORT_OPER_STATUS_TESTING,

/** Not Present */
SAI_PORT_OPER_STATUS_NOT_PRESENT,

/** Remote-fault observed on the port */
SAI_PORT_OPER_STATUS_REMOTE_FAULT,

/** Local-fault for example loss of signal and more */
SAI_PORT_OPER_STATUS_LOCAL_FAULT,

/** Pre-emphasis set failed when configuration is done*/
SAI_PORT_OPER_STATUS_PREEMPHASIS_SET_FAILED,

/** FEC set failed when configuration is done */
SAI_PORT_OPER_STATUS_FEC_SET_FAILED,

/** Speed set failed when configuration is done */
SAI_PORT_OPER_STATUS_SPEED_SET_FAILED,

/** Interface type set failed when configuration is done */
SAI_PORT_OPER_STATUS_IF_TYPE_SET_FAILED,

/** Media type set failed when configuration is done */
SAI_PORT_OPER_STATUS_MEDIA_TYPE_SET_FAILED,

....

@JaiOCP we decided to not use existing notification instead created new notification which can be used in future by adding new faults (apart from local/remote) faults.

@JaiOCP
Copy link
Contributor

JaiOCP commented Aug 19, 2024 via email

inc/saiport.h Outdated Show resolved Hide resolved
inc/saiport.h Outdated Show resolved Hide resolved
inc/saiport.h Outdated
Comment on lines 80 to 127
typedef enum _sai_port_error_status_t
{
/** No errors */
SAI_PORT_ERROR_STATUS_CLEAR = 0,

SAI_PORT_ERROR_STATUS_MAC_LOCAL_FAULT = 1,

SAI_PORT_ERROR_STATUS_MAC_REMOTE_FAULT = 2,

SAI_PORT_ERROR_STATUS_FEC_SYNC_LOSS = 4,

SAI_PORT_ERROR_STATUS_FEC_LOSS_ALIGNMENT_MARKER = 8
} sai_port_error_status_t;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an enum sai_port_err_status_t that has similar status'. Can we reuse it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshitgulati18 thanks for pointing out. sai_port_err_status_list_t is a list of various port error which needs to be traversed to find out which ERROR is set. As the list grows it becomes less efficient whereas in this PR the proposal is to use 64-bit bitmap of errors which is more efficient. I guess 64 bit is sufficient to cover port related errors...

inc/saiport.h Show resolved Hide resolved
inc/saiport.h Outdated

SAI_PORT_ERROR_AUTONEG_FAILED=0x8,

SAI_PORT_ERROR_LINK_TRAINING_FAILED=0x10,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplication, appears twice

SAI_PORT_ERROR_STATUS_FEC_SYNC_LOSS = 4,

SAI_PORT_ERROR_STATUS_FEC_LOSS_ALIGNMENT_MARKER = 8
} sai_port_error_status_t;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add SAI_PORT_ERROR_STATUS_HIGH_SER and SAI_PORT_ERROR_STATUS_HIGH_BER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inc/saiport.h Show resolved Hide resolved
Copy link
Contributor

@mikeberesford mikeberesford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

/**
* @brief Attribute bitmap data for #SAI_PORT_ATTR_ERROR_STATUS
*/
typedef enum _sai_port_error_status_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are any other values from sai_port_err_status_t needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeberesford may I know which one you are interested to be added here? I am not sure if I can duplicate the attributes here but I can try...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally mostly interested in local/remote fault, just want to make sure the others are not useful or are covered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to Local/Remote fault, SAI_PORT_ERR_STATUS_CRC_RATE would also be good to include here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshitgulati18 added now

@prgeor
Copy link
Contributor Author

prgeor commented Sep 16, 2024

@mikeberesford may I know what is blocking?

@mikeberesford
Copy link
Contributor

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

@prgeor
Copy link
Contributor Author

prgeor commented Sep 19, 2024

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

@mikeberesford can you check now?

@mikeberesford
Copy link
Contributor

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

@mikeberesford can you check now?

Please add @deprecated true to SAI_PORT_ATTR_ERR_STATUS_LIST as well?

@prgeor
Copy link
Contributor Author

prgeor commented Sep 19, 2024

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

@mikeberesford can you check now?

Please add @deprecated true to SAI_PORT_ATTR_ERR_STATUS_LIST as well?

@mikeberesford added

@rlhui rlhui added the reviewed PR is discussed in SAI Meeting label Sep 19, 2024
@mikeberesford
Copy link
Contributor

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?

@mikeberesford can you check now?

Please add @deprecated true to SAI_PORT_ATTR_ERR_STATUS_LIST as well?

@mikeberesford added

Still not done? I don't appear to be able to leave a comment on the line, but I still see:

    /**
     * @brief Port Down Error Status
     *
     * @type sai_port_err_status_list_t
     * @flags READ_ONLY
     */
    SAI_PORT_ATTR_ERR_STATUS_LIST,

@@ -1935,6 +1988,7 @@ typedef enum _sai_port_attr_t
*
* @type sai_port_err_status_list_t
* @flags READ_ONLY
* @deprecated true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeberesford now added here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor Author

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaiOCP can you review?

@@ -1935,6 +1988,7 @@ typedef enum _sai_port_attr_t
*
* @type sai_port_err_status_list_t
* @flags READ_ONLY
* @deprecated true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@zhangyanzhao
Copy link

@kcudnik can you please check if this PR can be merged? This has been approved by multiple reviewers. Thanks.

Copy link
Contributor

@JaiOCP JaiOCP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a backward compatible change if NOS and SAI are not using the same header files. We need to capture this in release notes.

Signed-off-by: Prince George <prgeor@microsoft.com>
@rlhui rlhui merged commit 0003491 into opencomputeproject:master Sep 24, 2024
3 checks passed
siqbal1986 pushed a commit to siqbal1986/SAI that referenced this pull request Sep 30, 2024
…teproject#2060)

Port status change notification now adds additional field to indicate various port error/fault status that caused the port operational status to go down. A capability attribute is added to query the Switch's capability that can be used to know if the new fault/error status are supported in the port status change notification.

sai_port_error_status_t bitmap can indicate more than one port error conditions by setting the corresponding error bit indicating why the port went down. Ideally, when the port operational status goes UP, the bitmap value should be ZERO i.e no errors.

Adding the error status as part of the port link status change notification allows better correlation of port operational status with various port error/fault conditions events that may have caused the port down event.

Signed-off-by: Prince George <prgeor@microsoft.com>
Signed-off-by: siqbal1986 <shahzad.iqbal@microsoft.com>
@@ -89,6 +140,8 @@ typedef struct _sai_port_oper_status_notification_t
/** Port operational status */
sai_port_oper_status_t port_state;

/** Bitmap of various port error or fault status */
sai_port_error_status_t port_error_status;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is breaking change not backward compatible, could cause some issues on sonic when deserialize

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding this extra field we should make this RO attribute per port, and if user would be interested in port status, then it could query that attribute in notification using SAI api, then everything would be backward compatible, we can still move this around

* @brief Attribute data for #SAI_PORT_ATTR_ERROR_STATUS
* Note enum values must be powers of 2 to be used as Bit mask to query multiple errors
*
* @flags free
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be changed to "strict" then it will force validator to chack if all enum are power of 2 to catch user mistakes

typedef enum _sai_port_error_status_t
{
/** No errors */
SAI_PORT_ERROR_STATUS_CLEAR = 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor @lguohan I added exception for that, but i would vote that this flag should be removed, since its not real flag, usually flags are used like this:

 if (x & FLAG) == FLAG)

checks if flag is set, in this case if you use CLEAR value then this condidtion will be always true, and CLEAR is not part of actual flags that can be set, it seems like intention is to do this:

if (x == CLEAR) ...

but i think that is a bad design, more natural is first code snipper or eventually it could be done like this:

if (x) ... // some flags are set

do we need a live discussion on this or i can just remove this entry ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik sai_port_error_status_t being used as enum or flag is up to the implementation. I can use enum value like below:-

I can use the enum value like so

if (port_error_status) {
// Found Errors
}
else {
// No Errors
}
`

In another implementation I can look for only interested errors where flag could be useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the same as i wrote in my comment, but there eis no need for SAI_PORT_ERROR_STATUS_CLEAR enum value zero, im proposing to remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik why do you say there is no need for SAI_PORT_ERROR_STATUS_CLEAR ? I am planning to use it when there is port UP event and gave the example above .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because that enum is not an actual flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik as I mentioned I am planning to use sai_port_error_status_t as both as an enum value and also also as a flag. I don't want to check each and every flag rather at time I am interested only in few flags in which case I know the enum value and in some case I may be interested only to check one flag. It not just about CLEAR flag.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLEAR is not a flag is definition of absence of any flags,, read my first comment in this thread

@kcudnik
Copy link
Collaborator

kcudnik commented Oct 12, 2024

@lguohan @JaiOCP @rlhui @prgeor since we agreed to propose new notification extended struct count/attr_list which i prepared here: #2087

i propose to override/rewrite history without ever changing sai_port_oper_status_notifiation_t struct in the first place to be backward compatible all the time, does this sound good ?

erohsik pushed a commit to erohsik/SAI that referenced this pull request Nov 7, 2024
…teproject#2060)

Port status change notification now adds additional field to indicate various port error/fault status that caused the port operational status to go down. A capability attribute is added to query the Switch's capability that can be used to know if the new fault/error status are supported in the port status change notification.

sai_port_error_status_t bitmap can indicate more than one port error conditions by setting the corresponding error bit indicating why the port went down. Ideally, when the port operational status goes UP, the bitmap value should be ZERO i.e no errors.

Adding the error status as part of the port link status change notification allows better correlation of port operational status with various port error/fault conditions events that may have caused the port down event.

Signed-off-by: Prince George <prgeor@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewed PR is discussed in SAI Meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants