-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios #3394
Conversation
…mmature records scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
… scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
Quality Gate passedIssues Measures |
Test Results 20 files - 3 274 suites - 35 41m 56s ⏱️ - 16m 8s For more details on these failures, see this check. Results for commit be09935. ± Comparison against base commit e779b0f. This pull request removes 3 tests.
♻️ This comment has been updated with latest results. |
503 is Service Unavaialable. How will we distinguish the normal 503 and the inmature logs? Related to this, whats the current alert logic? What will happen with 502s for example? |
Thanks @Ferparishuertas for the comment.
The intent here is to distinguish errors arising from immature records from other internal errors within the Relay module. Up until this PR, any internal error caused by the Relay module has been consistently mapped to a 500 Internal Server Error. This change introduces a 503 status code specifically for immature records scenarios, making it the first such instance where a dependent service issue is explicitly identified with a 503.
While the Mirror Node likely has the capability to distinguish errors and send different codes, it is important to note that they are actively addressing the root cause of the replica synchronization issue in their database. Their long-term solution aims to prevent such scenarios altogether. However, as a downstream service, the Relay module is responsible for standardizing error handling, and this PR assigns the 503 code to these cases in alignment with HTTP semantics.
After some exploration and research, I find that the 503 status code is quite more appropriate for these scenarios. While a 502 Bad Gateway indicates that the server received an invalidd response from an upstream server, typically refers to scenarios where the response is fundamentally invalid or cannot be processed—such as malformed headers, incomplete payloads, or protocol-level errors. In contrast, immature records, while incomplete, are still HTTP-valid responses that conform to protocol standards but lack the expected data due to transient synchronization delays. As a result, a 503 Service Unavailable status code is believed to be better aligns with this condition, signaling a temporary issue with the dependent Mirror Node service. Notably, when the request is retried after some time (off the flow), the records are fully complete, with no nullable or empty fields, further reinforcing the appropriateness of the 503 designation for these cases. Let me know your thoughts; I’m open to discussing this further.
Could you clarify if you are suggesting introducing a new code for 502 scenarios or proposing something else?
The current alerting mechanism calculates the proportion of non-acceptable (TBD on this name) HTTP response codes (status codes not equal to 200, 400, or 501) for specific RPC relay methods relative to the total number of response codes for those methods and namespaces within a specified time range. If the failed requests exceed 5% of total requests within this range, an alert is triggered and sent to the alerting channel.
Since 502 is not currently included in the specific list of acceptable codes (200, 400, or 501), it would be categorized as a failed request. If the cumulative percentage of all failed requests, including 502s, crosses the 5% threshold, an alert would be issued accordingly. |
Thanks for all the clarifications. That said, I still have the same
question. Let me formalize it in a more precise manner
- Are we going to distinguish 503 errors at mirror node level at metrics
level from immature logs where we map it to 503?
- If mirror node gives a 502, which error code are we showing
My overall question is how to distinguish our errors, vs dependant service
errors
…On Thu, Jan 16, 2025 at 7:05 PM Logan Nguyen ***@***.***> wrote:
503 is Service Unavaialable. How will we distinguish the normal 503 and
the inmature logs? Can mirror node distinguish the error and send
differente codes? on the other hand yesterday we mentiones 502s, on mirror
node. I suggest to add an error code too
Related to this, whats the current alert logic? What will happen with 502s
for example?
Thanks @Ferparishuertas <https://github.com/Ferparishuertas> for the
comment.
*503 is Service Unavailable. How will we distinguish the normal 503 and
the immature logs?*
The intent here is to distinguish errors arising from immature records
from other internal errors within the Relay module. Up until this PR, any
internal error caused by the Relay module has been consistently mapped to a
500 Internal Server Error. This change introduces a 503 status code
specifically for immature records scenarios, making it the first such
instance where a dependent service issue is explicitly identified with a
503.
*Can the Mirror Node distinguish the error and send different codes?*
While the Mirror Node likely has the capability to distinguish errors and
send different codes, it is important to note that they are actively
addressing the root cause of the replica synchronization issue in their
database. Their long-term solution aims to prevent such scenarios
altogether. However, as a downstream service, the Relay module is
responsible for standardizing error handling, and this PR assigns the 503
code to these cases in alignment with HTTP semantics.
*On the other hand, yesterday we mentioned 502s on the Mirror Node.*
After some exploration and research, I find that the 503 status code is
quite more appropriate for these scenarios. While a 502 Bad Gateway
indicates that the server received an invalidd response from an upstream
server, typically refers to scenarios where the response is fundamentally
invalid or cannot be processed—such as malformed headers, incomplete
payloads, or protocol-level errors. In contrast, immature records, while
incomplete, are still HTTP-valid responses that conform to protocol
standards but lack the expected data due to transient synchronization
delays. As a result, a 503 Service Unavailable status code is believed to
be better aligns with this condition, signaling a temporary issue with the
dependent Mirror Node service. Notably, when the request is retried after
some time (off the flow), the records are fully complete, with no nullable
or empty fields, further reinforcing the appropriateness of the 503
designation for these cases. Let me know your thoughts; I’m open to
discussing this further.
*I suggest adding an error code too.*
Could you clarify if you are suggesting introducing a new code for 502
scenarios or proposing something else?
*Related to this, what’s the current alert logic?*
The current alerting mechanism calculates the proportion of non-acceptable
(TBD on this name) HTTP response codes (status codes not equal to 200, 400,
or 501) for specific RPC relay methods relative to the total number of
response codes for those methods and namespaces within a specified time
range. If the failed requests exceed 5% of total requests within this
range, an alert is triggered and sent to the alerting channel.
*What will happen with 502s, for example?*
Since 502 is not currently included in the specific list of acceptable
codes (200, 400, or 501), it would be categorized as a failed request. If
the cumulative percentage of all failed requests, including 502s, crosses
the 5% threshold, an alert would be issued accordingly.
—
Reply to this email directly, view it on GitHub
<#3394 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKXFDQMMCLSEWGUXNVY2JL2K7YIBAVCNFSM6AAAAABVIHXQ62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWGM3TSMRVG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hey @Ferparishuertas thanks for the quick response.
Ahhh so currently, any failure from upstream servers (e.g. MN or CN) is mapped to a 500 Internal Server Error by the Relay module. It’s important to note that immature records are not technically errors; they represent valid responses returned by the Mirror Node, albeit incomplete due to synchronization delays. In the Relay module, immature records are explicitly treated as "errors" for the purpose of standardization and are mapped to a 503 Service Unavailable status code. Thus:
As mentioned above, this will be 500. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left one nit
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3394 +/- ##
==========================================
+ Coverage 84.87% 84.97% +0.09%
==========================================
Files 69 69
Lines 4742 4738 -4
Branches 1067 1066 -1
==========================================
+ Hits 4025 4026 +1
+ Misses 400 398 -2
+ Partials 317 314 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…mmature records scenarios (#3394) * feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com> * test: fixed unit tests for the new DEPENDENT_SERVICE_IMMATURE_RECORDS scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com> --------- Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
…defined error for immature records scenarios (#3394) (#3415) feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios (#3394) * feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios * test: fixed unit tests for the new DEPENDENT_SERVICE_IMMATURE_RECORDS scenarios --------- Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
Description:
This PR introduces a new predefined error, DEPENDENT_SERVICE_IMMATURE_RECORDS, with the JSON-RPC code
-32015
, mapped to the HTTP status code503
. This improves error handling for cases where the dependent Mirror Node service returns immature records.-32015
: Chosen as it fits within the JSON-RPC application-defined server error range (-32000 to -32099) and avoids conflicts with existing error codes.HTTP 503
: Represents a temporary unavailability of a dependent service, signaling clients to retry.Related issue(s):
Fixes #3392
Notes for reviewer:
Checklist