feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios #3394

quiet-node · 2025-01-15T22:13:18Z

Description:
This PR introduces a new predefined error, DEPENDENT_SERVICE_IMMATURE_RECORDS, with the JSON-RPC code -32015, mapped to the HTTP status code 503. This improves error handling for cases where the dependent Mirror Node service returns immature records.

-32015: Chosen as it fits within the JSON-RPC application-defined server error range (-32000 to -32099) and avoids conflicts with existing error codes.
HTTP 503: Represents a temporary unavailability of a dependent service, signaling clients to retry.

Related issue(s):

Fixes #3392

Notes for reviewer:

Checklist

Documented (Code comments, README, etc.)
Tested (unit, integration, etc.)

…mmature records scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

… scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

sonarqubecloud · 2025-01-15T22:14:06Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2025-01-15T22:18:43Z

Test Results

20 files - 3 274 suites - 35 41m 56s ⏱️ - 16m 8s
613 tests - 3 608 ✅ + 10 4 💤 ±0 1 ❌ - 13
778 runs - 225 768 ✅ - 212 8 💤 +1 2 ❌ - 14

For more details on these failures, see this check.

Results for commit be09935. ± Comparison against base commit e779b0f.

This pull request removes 3 tests.

"after all" hook in "@web-socket-batch-1 eth_getBalance" ‑ RPC Server Acceptance Tests Acceptance tests @web-socket-batch-1 eth_getBalance "after all" hook in "@web-socket-batch-1 eth_getBalance"
"after all" hook in "RPC Server Acceptance Tests" ‑ RPC Server Acceptance Tests "after all" hook in "RPC Server Acceptance Tests"
"before each" hook for "should execute "eth_getStorageAt" request to get current state changes" ‑ RPC Server Acceptance Tests Acceptance tests @api-batch-2 RPC Server Acceptance Tests "before each" hook for "should execute "eth_getStorageAt" request to get current state changes"

♻️ This comment has been updated with latest results.

Ferparishuertas · 2025-01-16T09:54:23Z

503 is Service Unavaialable. How will we distinguish the normal 503 and the inmature logs?
Can mirror node distinguish the error and send differente codes?
on the other hand yesterday we mentiones 502s, on mirror node. I suggest to add an error code too

Related to this, whats the current alert logic? What will happen with 502s for example?

quiet-node · 2025-01-16T18:05:29Z

503 is Service Unavaialable. How will we distinguish the normal 503 and the inmature logs? Can mirror node distinguish the error and send differente codes? on the other hand yesterday we mentiones 502s, on mirror node. I suggest to add an error code too

Related to this, whats the current alert logic? What will happen with 502s for example?

Thanks @Ferparishuertas for the comment.

503 is Service Unavailable. How will we distinguish the normal 503 and the immature logs?

The intent here is to distinguish errors arising from immature records from other internal errors within the Relay module. Up until this PR, any internal error caused by the Relay module has been consistently mapped to a 500 Internal Server Error. This change introduces a 503 status code specifically for immature records scenarios, making it the first such instance where a dependent service issue is explicitly identified with a 503.

Can the Mirror Node distinguish the error and send different codes?

While the Mirror Node likely has the capability to distinguish errors and send different codes, it is important to note that they are actively addressing the root cause of the replica synchronization issue in their database. Their long-term solution aims to prevent such scenarios altogether. However, as a downstream service, the Relay module is responsible for standardizing error handling, and this PR assigns the 503 code to these cases in alignment with HTTP semantics.

On the other hand, yesterday we mentioned 502s on the Mirror Node.

After some exploration and research, I find that the 503 status code is quite more appropriate for these scenarios. While a 502 Bad Gateway indicates that the server received an invalidd response from an upstream server, typically refers to scenarios where the response is fundamentally invalid or cannot be processed—such as malformed headers, incomplete payloads, or protocol-level errors. In contrast, immature records, while incomplete, are still HTTP-valid responses that conform to protocol standards but lack the expected data due to transient synchronization delays. As a result, a 503 Service Unavailable status code is believed to be better aligns with this condition, signaling a temporary issue with the dependent Mirror Node service. Notably, when the request is retried after some time (off the flow), the records are fully complete, with no nullable or empty fields, further reinforcing the appropriateness of the 503 designation for these cases. Let me know your thoughts; I’m open to discussing this further.

I suggest adding an error code too.

Could you clarify if you are suggesting introducing a new code for 502 scenarios or proposing something else?

Related to this, what’s the current alert logic?

The current alerting mechanism calculates the proportion of non-acceptable (TBD on this name) HTTP response codes (status codes not equal to 200, 400, or 501) for specific RPC relay methods relative to the total number of response codes for those methods and namespaces within a specified time range. If the failed requests exceed 5% of total requests within this range, an alert is triggered and sent to the alerting channel.

What will happen with 502s, for example?

Since 502 is not currently included in the specific list of acceptable codes (200, 400, or 501), it would be categorized as a failed request. If the cumulative percentage of all failed requests, including 502s, crosses the 5% threshold, an alert would be issued accordingly.

Ferparishuertas · 2025-01-16T18:13:35Z

Thanks for all the clarifications. That said, I still have the same question. Let me formalize it in a more precise manner - Are we going to distinguish 503 errors at mirror node level at metrics level from immature logs where we map it to 503? - If mirror node gives a 502, which error code are we showing My overall question is how to distinguish our errors, vs dependant service errors

…

On Thu, Jan 16, 2025 at 7:05 PM Logan Nguyen ***@***.***> wrote: 503 is Service Unavaialable. How will we distinguish the normal 503 and the inmature logs? Can mirror node distinguish the error and send differente codes? on the other hand yesterday we mentiones 502s, on mirror node. I suggest to add an error code too Related to this, whats the current alert logic? What will happen with 502s for example? Thanks @Ferparishuertas <https://github.com/Ferparishuertas> for the comment. *503 is Service Unavailable. How will we distinguish the normal 503 and the immature logs?* The intent here is to distinguish errors arising from immature records from other internal errors within the Relay module. Up until this PR, any internal error caused by the Relay module has been consistently mapped to a 500 Internal Server Error. This change introduces a 503 status code specifically for immature records scenarios, making it the first such instance where a dependent service issue is explicitly identified with a 503. *Can the Mirror Node distinguish the error and send different codes?* While the Mirror Node likely has the capability to distinguish errors and send different codes, it is important to note that they are actively addressing the root cause of the replica synchronization issue in their database. Their long-term solution aims to prevent such scenarios altogether. However, as a downstream service, the Relay module is responsible for standardizing error handling, and this PR assigns the 503 code to these cases in alignment with HTTP semantics. *On the other hand, yesterday we mentioned 502s on the Mirror Node.* After some exploration and research, I find that the 503 status code is quite more appropriate for these scenarios. While a 502 Bad Gateway indicates that the server received an invalidd response from an upstream server, typically refers to scenarios where the response is fundamentally invalid or cannot be processed—such as malformed headers, incomplete payloads, or protocol-level errors. In contrast, immature records, while incomplete, are still HTTP-valid responses that conform to protocol standards but lack the expected data due to transient synchronization delays. As a result, a 503 Service Unavailable status code is believed to be better aligns with this condition, signaling a temporary issue with the dependent Mirror Node service. Notably, when the request is retried after some time (off the flow), the records are fully complete, with no nullable or empty fields, further reinforcing the appropriateness of the 503 designation for these cases. Let me know your thoughts; I’m open to discussing this further. *I suggest adding an error code too.* Could you clarify if you are suggesting introducing a new code for 502 scenarios or proposing something else? *Related to this, what’s the current alert logic?* The current alerting mechanism calculates the proportion of non-acceptable (TBD on this name) HTTP response codes (status codes not equal to 200, 400, or 501) for specific RPC relay methods relative to the total number of response codes for those methods and namespaces within a specified time range. If the failed requests exceed 5% of total requests within this range, an alert is triggered and sent to the alerting channel. *What will happen with 502s, for example?* Since 502 is not currently included in the specific list of acceptable codes (200, 400, or 501), it would be categorized as a failed request. If the cumulative percentage of all failed requests, including 502s, crosses the 5% threshold, an alert would be issued accordingly. — Reply to this email directly, view it on GitHub <#3394 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKXFDQMMCLSEWGUXNVY2JL2K7YIBAVCNFSM6AAAAABVIHXQ62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWGM3TSMRVG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

quiet-node · 2025-01-16T18:48:45Z

Hey @Ferparishuertas thanks for the quick response.

Are we going to distinguish 503 errors at mirror node level at metrics level from immature logs where we map it to 503?

Ahhh so currently, any failure from upstream servers (e.g. MN or CN) is mapped to a 500 Internal Server Error by the Relay module. It’s important to note that immature records are not technically errors; they represent valid responses returned by the Mirror Node, albeit incomplete due to synchronization delays.

In the Relay module, immature records are explicitly treated as "errors" for the purpose of standardization and are mapped to a 503 Service Unavailable status code. Thus:

A 503 error directly from the Mirror Node is mapped to a 500 error by the Relay.
Immature records from the Mirror Node are mapped to a 503 error by the Relay.

If mirror node gives a 502, which error code are we showing

As mentioned above, this will be 500.

Ferparishuertas

LGTM

acuarica

left one nit

packages/relay/tests/lib/mirrorNodeClient.spec.ts

codecov · 2025-01-22T16:07:47Z

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.97%. Comparing base (e779b0f) to head (be09935).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
packages/relay/src/lib/clients/mirrorNodeClient.ts	75.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3394      +/-   ##
==========================================
+ Coverage   84.87%   84.97%   +0.09%     
==========================================
  Files          69       69              
  Lines        4742     4738       -4     
  Branches     1067     1066       -1     
==========================================
+ Hits         4025     4026       +1     
+ Misses        400      398       -2     
+ Partials      317      314       -3

Flag	Coverage Δ
config-service	`98.14% <ø> (ø)`
relay	`79.56% <77.77%> (-0.01%)`	⬇️
server	`83.30% <100.00%> (+0.02%)`	⬆️
ws-server	`36.66% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
packages/relay/src/lib/errors/JsonRpcError.ts	`74.50% <ø> (ø)`
packages/relay/src/lib/eth.ts	`86.32% <ø> (+0.31%)`	⬆️
.../lib/services/ethService/ethCommonService/index.ts	`91.30% <100.00%> (+1.09%)`	⬆️
...ver/src/koaJsonRpc/lib/HttpStatusCodeAndMessage.ts	`100.00% <100.00%> (ø)`
packages/relay/src/lib/clients/mirrorNodeClient.ts	`88.31% <75.00%> (-0.44%)`	⬇️

... and 1 file with indirect coverage changes

…mmature records scenarios (#3394) * feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com> * test: fixed unit tests for the new DEPENDENT_SERVICE_IMMATURE_RECORDS scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com> --------- Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

…defined error for immature records scenarios (#3394) (#3415) feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios (#3394) * feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios * test: fixed unit tests for the new DEPENDENT_SERVICE_IMMATURE_RECORDS scenarios --------- Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

quiet-node added 2 commits January 15, 2025 15:57

feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for i…

8e34948

…mmature records scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

test: fixed unit tests for the new DEPENDENT_SERVICE_IMMATURE_RECORDS…

be09935

… scenarios Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

quiet-node added the internal For changes that affect the project's internal workings but not its outward-facing functionality. label Jan 15, 2025

quiet-node added this to the 0.64.1 milestone Jan 15, 2025

quiet-node self-assigned this Jan 15, 2025

quiet-node requested review from Nana-EC and a team as code owners January 15, 2025 22:13

quiet-node linked an issue Jan 15, 2025 that may be closed by this pull request

[Failed-Request-Alert-Tuning] Assign Appropriate HTTP Status Code for Immature Records error #3392

Closed

Ferparishuertas approved these changes Jan 17, 2025

View reviewed changes

acuarica approved these changes Jan 21, 2025

View reviewed changes

packages/relay/tests/lib/mirrorNodeClient.spec.ts Show resolved Hide resolved

quiet-node merged commit 0c6d059 into main Jan 22, 2025
50 of 52 checks passed

quiet-node deleted the 3392-failed-request-alert-tuning-assign-appropriate-http-status-code-for-immature-records-error branch January 22, 2025 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios #3394

feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios #3394

quiet-node commented Jan 15, 2025 •

edited

Loading

sonarqubecloud bot commented Jan 15, 2025

github-actions bot commented Jan 15, 2025 •

edited

Loading

Ferparishuertas commented Jan 16, 2025 •

edited

Loading

quiet-node commented Jan 16, 2025

Ferparishuertas commented Jan 16, 2025 via email

quiet-node commented Jan 16, 2025 •

edited

Loading

Ferparishuertas left a comment

acuarica left a comment

codecov bot commented Jan 22, 2025

feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios #3394

feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios #3394

Conversation

quiet-node commented Jan 15, 2025 • edited Loading

sonarqubecloud bot commented Jan 15, 2025

Quality Gate passed

github-actions bot commented Jan 15, 2025 • edited Loading

Test Results

Ferparishuertas commented Jan 16, 2025 • edited Loading

quiet-node commented Jan 16, 2025

Ferparishuertas commented Jan 16, 2025 via email

quiet-node commented Jan 16, 2025 • edited Loading

Ferparishuertas left a comment

Choose a reason for hiding this comment

acuarica left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 22, 2025

Codecov Report

quiet-node commented Jan 15, 2025 •

edited

Loading

github-actions bot commented Jan 15, 2025 •

edited

Loading

Ferparishuertas commented Jan 16, 2025 •

edited

Loading

quiet-node commented Jan 16, 2025 •

edited

Loading