Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios #3394

Conversation

quiet-node
Copy link
Member

@quiet-node quiet-node commented Jan 15, 2025

Description:
This PR introduces a new predefined error, DEPENDENT_SERVICE_IMMATURE_RECORDS, with the JSON-RPC code -32015, mapped to the HTTP status code 503. This improves error handling for cases where the dependent Mirror Node service returns immature records.

  • -32015: Chosen as it fits within the JSON-RPC application-defined server error range (-32000 to -32099) and avoids conflicts with existing error codes.
  • HTTP 503: Represents a temporary unavailability of a dependent service, signaling clients to retry.

Related issue(s):

Fixes #3392

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

…mmature records scenarios

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
… scenarios

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
@quiet-node quiet-node added the internal For changes that affect the project's internal workings but not its outward-facing functionality. label Jan 15, 2025
@quiet-node quiet-node added this to the 0.64.1 milestone Jan 15, 2025
@quiet-node quiet-node self-assigned this Jan 15, 2025
@quiet-node quiet-node requested review from Nana-EC and a team as code owners January 15, 2025 22:13
Copy link

github-actions bot commented Jan 15, 2025

Test Results

 20 files   -   3  274 suites   - 35   41m 56s ⏱️ - 16m 8s
613 tests  -   3  608 ✅ + 10  4 💤 ±0  1 ❌  - 13 
778 runs   - 225  768 ✅  - 212  8 💤 +1  2 ❌  - 14 

For more details on these failures, see this check.

Results for commit be09935. ± Comparison against base commit e779b0f.

This pull request removes 3 tests.
"after all" hook in "@web-socket-batch-1 eth_getBalance" ‑ RPC Server Acceptance Tests Acceptance tests @web-socket-batch-1 eth_getBalance "after all" hook in "@web-socket-batch-1 eth_getBalance"
"after all" hook in "RPC Server Acceptance Tests" ‑ RPC Server Acceptance Tests "after all" hook in "RPC Server Acceptance Tests"
"before each" hook for "should execute "eth_getStorageAt" request to get current state changes" ‑ RPC Server Acceptance Tests Acceptance tests @api-batch-2 RPC Server Acceptance Tests "before each" hook for "should execute "eth_getStorageAt" request to get current state changes"

♻️ This comment has been updated with latest results.

@Ferparishuertas
Copy link

Ferparishuertas commented Jan 16, 2025

503 is Service Unavaialable. How will we distinguish the normal 503 and the inmature logs?
Can mirror node distinguish the error and send differente codes?
on the other hand yesterday we mentiones 502s, on mirror node. I suggest to add an error code too

Related to this, whats the current alert logic? What will happen with 502s for example?

@quiet-node
Copy link
Member Author

503 is Service Unavaialable. How will we distinguish the normal 503 and the inmature logs? Can mirror node distinguish the error and send differente codes? on the other hand yesterday we mentiones 502s, on mirror node. I suggest to add an error code too

Related to this, whats the current alert logic? What will happen with 502s for example?

Thanks @Ferparishuertas for the comment.

503 is Service Unavailable. How will we distinguish the normal 503 and the immature logs?

The intent here is to distinguish errors arising from immature records from other internal errors within the Relay module. Up until this PR, any internal error caused by the Relay module has been consistently mapped to a 500 Internal Server Error. This change introduces a 503 status code specifically for immature records scenarios, making it the first such instance where a dependent service issue is explicitly identified with a 503.

Can the Mirror Node distinguish the error and send different codes?

While the Mirror Node likely has the capability to distinguish errors and send different codes, it is important to note that they are actively addressing the root cause of the replica synchronization issue in their database. Their long-term solution aims to prevent such scenarios altogether. However, as a downstream service, the Relay module is responsible for standardizing error handling, and this PR assigns the 503 code to these cases in alignment with HTTP semantics.

On the other hand, yesterday we mentioned 502s on the Mirror Node.

After some exploration and research, I find that the 503 status code is quite more appropriate for these scenarios. While a 502 Bad Gateway indicates that the server received an invalidd response from an upstream server, typically refers to scenarios where the response is fundamentally invalid or cannot be processed—such as malformed headers, incomplete payloads, or protocol-level errors. In contrast, immature records, while incomplete, are still HTTP-valid responses that conform to protocol standards but lack the expected data due to transient synchronization delays. As a result, a 503 Service Unavailable status code is believed to be better aligns with this condition, signaling a temporary issue with the dependent Mirror Node service. Notably, when the request is retried after some time (off the flow), the records are fully complete, with no nullable or empty fields, further reinforcing the appropriateness of the 503 designation for these cases. Let me know your thoughts; I’m open to discussing this further.

I suggest adding an error code too.

Could you clarify if you are suggesting introducing a new code for 502 scenarios or proposing something else?

Related to this, what’s the current alert logic?

The current alerting mechanism calculates the proportion of non-acceptable (TBD on this name) HTTP response codes (status codes not equal to 200, 400, or 501) for specific RPC relay methods relative to the total number of response codes for those methods and namespaces within a specified time range. If the failed requests exceed 5% of total requests within this range, an alert is triggered and sent to the alerting channel.

What will happen with 502s, for example?

Since 502 is not currently included in the specific list of acceptable codes (200, 400, or 501), it would be categorized as a failed request. If the cumulative percentage of all failed requests, including 502s, crosses the 5% threshold, an alert would be issued accordingly.

@Ferparishuertas
Copy link

Ferparishuertas commented Jan 16, 2025 via email

@quiet-node
Copy link
Member Author

quiet-node commented Jan 16, 2025

Hey @Ferparishuertas thanks for the quick response.

Are we going to distinguish 503 errors at mirror node level at metrics level from immature logs where we map it to 503?

Ahhh so currently, any failure from upstream servers (e.g. MN or CN) is mapped to a 500 Internal Server Error by the Relay module. It’s important to note that immature records are not technically errors; they represent valid responses returned by the Mirror Node, albeit incomplete due to synchronization delays.

In the Relay module, immature records are explicitly treated as "errors" for the purpose of standardization and are mapped to a 503 Service Unavailable status code. Thus:

  • A 503 error directly from the Mirror Node is mapped to a 500 error by the Relay.
  • Immature records from the Mirror Node are mapped to a 503 error by the Relay.

If mirror node gives a 502, which error code are we showing

As mentioned above, this will be 500.

Copy link

@Ferparishuertas Ferparishuertas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@acuarica acuarica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one nit

@quiet-node quiet-node merged commit 0c6d059 into main Jan 22, 2025
50 of 52 checks passed
@quiet-node quiet-node deleted the 3392-failed-request-alert-tuning-assign-appropriate-http-status-code-for-immature-records-error branch January 22, 2025 16:07
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.97%. Comparing base (e779b0f) to head (be09935).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
packages/relay/src/lib/clients/mirrorNodeClient.ts 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3394      +/-   ##
==========================================
+ Coverage   84.87%   84.97%   +0.09%     
==========================================
  Files          69       69              
  Lines        4742     4738       -4     
  Branches     1067     1066       -1     
==========================================
+ Hits         4025     4026       +1     
+ Misses        400      398       -2     
+ Partials      317      314       -3     
Flag Coverage Δ
config-service 98.14% <ø> (ø)
relay 79.56% <77.77%> (-0.01%) ⬇️
server 83.30% <100.00%> (+0.02%) ⬆️
ws-server 36.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
packages/relay/src/lib/errors/JsonRpcError.ts 74.50% <ø> (ø)
packages/relay/src/lib/eth.ts 86.32% <ø> (+0.31%) ⬆️
.../lib/services/ethService/ethCommonService/index.ts 91.30% <100.00%> (+1.09%) ⬆️
...ver/src/koaJsonRpc/lib/HttpStatusCodeAndMessage.ts 100.00% <100.00%> (ø)
packages/relay/src/lib/clients/mirrorNodeClient.ts 88.31% <75.00%> (-0.44%) ⬇️

... and 1 file with indirect coverage changes

quiet-node added a commit that referenced this pull request Jan 22, 2025
…mmature records scenarios (#3394)

* feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

* test: fixed unit tests for the new DEPENDENT_SERVICE_IMMATURE_RECORDS scenarios

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

---------

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
quiet-node added a commit that referenced this pull request Jan 23, 2025
…defined error for immature records scenarios (#3394) (#3415)

feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios (#3394)

* feat: added DEPENDENT_SERVICE_IMMATURE_RECORDS predefined error for immature records scenarios



* test: fixed unit tests for the new DEPENDENT_SERVICE_IMMATURE_RECORDS scenarios



---------

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal For changes that affect the project's internal workings but not its outward-facing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Failed-Request-Alert-Tuning] Assign Appropriate HTTP Status Code for Immature Records error
3 participants