Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enhanced reliability of eth RPC methods with null checks and retry mechanisms #3349

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

quiet-node
Copy link
Member

@quiet-node quiet-node commented Dec 20, 2024

Description:
Numerous requests for eth_getBlockByNumber and eth_getBlockByHash have been reported as failing with errors such as Cannot read properties of null and Received an invalid integer type: null.

Additionally, eth_getTransactionReceipt is also reported to be failing with Error invoking RPC: Invalid parameter: hashOrNumber.

These issues stem from regressions in the getBlock() method, which lacks adequate null checks in certain areas. This pull request addresses the problem by implementing proper null checks and enhancing fallback mechanisms and error handling.

Partially, the results of contracts and log responses from the Mirror Node occasionally contain undefined fields, such as transaction_index, block_hash, block_number, and log_index. This issue may arise because the Mirror Node database does not have sufficient time to save all the information before it is fetched. This PR adds a retry mechanism to the fetching methods that checks for these cases, waits for a specified duration, and then retries the fetch operation.

Related issue(s):

Fixes #3345
Fixes #3351

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

Copy link

github-actions bot commented Dec 20, 2024

Test Results

 19 files   -   4  261 suites   - 88   33m 6s ⏱️ - 16m 6s
613 tests  -   4  608 ✅ +  5  4 💤 ±0  1 ❌  -  9 
715 runs   - 194  710 ✅  - 179  4 💤  - 3  1 ❌  - 12 

For more details on these failures, see this check.

Results for commit 8687322. ± Comparison against base commit 13d09e9.

This pull request removes 5 and adds 1 tests. Note that renamed tests count towards both.
"before all" hook for "emits an approval event" ‑ RPC Server Acceptance Tests Acceptance tests @erc20 Acceptance Tests HTS token should behave like erc20 transfer from when the token owner is not the zero address when the recipient is not the zero address when the spender has enough allowance "before all" hook for "emits an approval event"
"before all" hook for "reverts" ‑ RPC Server Acceptance Tests Acceptance tests @erc20 Acceptance Tests HTS token should behave like erc20 transfer from when the token owner is not the zero address when the recipient is not the zero address when the spender does not have enough allowance when the token owner has enough balance "before all" hook for "reverts"
"before all" hook in "Debug API Test Suite" ‑ RPC Server Acceptance Tests Acceptance tests @api-batch-3 RPC Server Acceptance Tests Debug API Test Suite "before all" hook in "Debug API Test Suite"
"before each" hook for "reverts" ‑ RPC Server Acceptance Tests Acceptance tests @erc20 Acceptance Tests HTS token should behave like erc20 transfer from when the token owner is not the zero address when the recipient is not the zero address when the spender does not have enough allowance "before each" hook for "reverts"
"before each" hook for "should execute "eth_getStorageAt" request to get current state changes" ‑ RPC Server Acceptance Tests Acceptance tests @api-batch-2 RPC Server Acceptance Tests "before each" hook for "should execute "eth_getStorageAt" request to get current state changes"
"after all" hook in "@web-socket-batch-1 eth_getBlockByNumber" ‑ RPC Server Acceptance Tests Acceptance tests @web-socket-batch-1 eth_getBlockByNumber "after all" hook in "@web-socket-batch-1 eth_getBlockByNumber"

♻️ This comment has been updated with latest results.

@quiet-node quiet-node modified the milestones: 0.62.0, 0.63.0 Dec 20, 2024
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from 675eb4a to d265305 Compare December 20, 2024 23:07
Copy link
Collaborator

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
Some considerations

packages/relay/src/lib/clients/mirrorNodeClient.ts Outdated Show resolved Hide resolved
@@ -334,7 +335,7 @@ export class CommonService implements ICommonService {
new Log({
address: log.address,
blockHash: toHash32(log.block_hash),
blockNumber: numberTo0x(log.block_number),
blockNumber: nullableNumberTo0x(log.block_number),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why would a block number be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is one of the strange cases where block_number, transaction_index, or blockHash are falsy, leading to 500 errors. The retry mechanism should address the issue. Should I revert this change and throw a more informative error message even if those fields remain falsy after the retry? Hmm maybe that would be better so the clients can resubmit the requests themselves other than having null in the response. Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, if a retry is ensured to resolve the issue then maybe we can manage one iteration in the short term.
We should open up an issue in the mirror node repo to investigate this issue so this can be resolved.

packages/relay/src/receiptsRootUtils.ts Show resolved Hide resolved
@@ -620,7 +620,10 @@ export class MirrorNodeClient {
requestDetails,
);

await this.cacheService.set(cachedLabel, block, MirrorNodeClient.GET_BLOCK_ENDPOINT, requestDetails);
if (block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I understand correctly, but the errors described in the issue were because of caching null blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue I noticed and an attempt securing the code. While it may or may not be directly related to the problem, we should ensure the block is not null before storing it in the cache. Otherwise, we risk repeatedly retrieving null from the cache, which likely has a TTL.

packages/relay/src/receiptsRootUtils.ts Show resolved Hide resolved
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from 936f1e9 to b959fc7 Compare December 21, 2024 02:20
packages/relay/src/lib/clients/mirrorNodeClient.ts Outdated Show resolved Hide resolved
packages/relay/src/lib/clients/mirrorNodeClient.ts Outdated Show resolved Hide resolved
packages/relay/src/lib/eth.ts Outdated Show resolved Hide resolved
@@ -2531,11 +2544,12 @@ export class EthImpl implements Eth {
if (blockResponse == null) return null;
const timestampRange = blockResponse.timestamp;
const timestampRangeParams = [`gte:${timestampRange.from}`, `lte:${timestampRange.to}`];
const contractResults = await this.mirrorNodeClient.getContractResults(
const contractResults = await this.mirrorNodeClient.getContractResultWithRetry(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just being cautious or does this case actually need to have retry support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contractResults here was a contributing factor to the 500 errors. In the rare cases mentioned, contractResults contains null fields. When passed to buildReceiptRootHashes, the function does not handle these null values correctly, resulting in 500 errors. Therefore, this scenario requires a retry if such a rare case occurs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these cases occur only on the mainnet and at the time of the transaction, I plan to release these updates that include additional debugging information for improved analysis. If the issues persist, it will be easier to pinpoint the root causes and escalate them to the mainnet team more effectively.

@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from 19d9838 to e1c8fd0 Compare December 21, 2024 23:14
@quiet-node quiet-node marked this pull request as draft December 21, 2024 23:14
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch 7 times, most recently from 3b96fc4 to aad493c Compare December 22, 2024 20:12
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch 2 times, most recently from a5a0e1f to 78fa644 Compare December 22, 2024 21:07
…Contract MN methods

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch 7 times, most recently from bac3c74 to d7f5829 Compare December 23, 2024 15:22
Comment on lines +727 to +731
this.logger.warn(
e,
`${requestDetails.formattedRequestId} Transaction failed while executing transaction via the SDK: transactionId=${transaction.transactionId}, callerName=${callerName}, txConstructorName=${txConstructorName}`,
);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for Rereviewers:
This is an effort to enhance debugging.

Comment on lines +1675 to +1676
await relay.pollForValidTransactionReceipt(transactionHash);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for Rereviewers:
This is an effort to stablize the flaky test.

Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>

Revert "fix: reverted licenses"

This reverts commit d3c860a.

Reapply "fix: reverted licenses"

This reverts commit 50a9acd5031490f3f805042a70bfdae7c589146d.
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from d6756f0 to 8687322 Compare December 23, 2024 18:19
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
14.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@quiet-node quiet-node changed the title fix: added proper null checks for getBlock() method fix: enhanced reliability of eth RPC methods with null checks and retry mechanisms Dec 23, 2024
Copy link

codecov bot commented Dec 24, 2024

Codecov Report

Attention: Patch coverage is 82.35294% with 9 lines in your changes missing coverage. Please review.

Project coverage is 84.99%. Comparing base (13d09e9) to head (8687322).

Files with missing lines Patch % Lines
packages/relay/src/lib/clients/mirrorNodeClient.ts 79.16% 2 Missing and 3 partials ⚠️
.../lib/services/ethService/ethCommonService/index.ts 76.92% 2 Missing and 1 partial ⚠️
packages/relay/src/receiptsRootUtils.ts 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3349      +/-   ##
==========================================
- Coverage   85.04%   84.99%   -0.06%     
==========================================
  Files          65       69       +4     
  Lines        4508     4717     +209     
  Branches     1023     1061      +38     
==========================================
+ Hits         3834     4009     +175     
- Misses        330      394      +64     
+ Partials      344      314      -30     
Flag Coverage Δ
config-service 98.14% <ø> (ø)
relay 79.53% <76.47%> (-0.20%) ⬇️
server 83.28% <ø> (?)
ws-server 36.66% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
packages/relay/src/lib/clients/sdkClient.ts 69.52% <100.00%> (+1.85%) ⬆️
packages/relay/src/lib/eth.ts 86.14% <100.00%> (ø)
...kages/relay/src/lib/services/debugService/index.ts 85.71% <100.00%> (ø)
packages/relay/src/receiptsRootUtils.ts 96.07% <87.50%> (-1.80%) ⬇️
.../lib/services/ethService/ethCommonService/index.ts 91.30% <76.92%> (-2.59%) ⬇️
packages/relay/src/lib/clients/mirrorNodeClient.ts 89.05% <79.16%> (-0.70%) ⬇️

... and 13 files with indirect coverage changes

@quiet-node quiet-node modified the milestones: 0.63.0, 0.63.1 Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants