Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11150. Recon Overview page crashes due to failed API Calls #6944

Merged
merged 9 commits into from
Jul 19, 2024

Conversation

devabhishekpal
Copy link
Contributor

@devabhishekpal devabhishekpal commented Jul 15, 2024

What changes were proposed in this pull request?

HDDS-11150. Shift Axios.all to Promise.allSettled and add fallback for error in failed requests

Please describe your PR in detail:

  • Currently Axios.all() uses Promise.all() in the background, but this causes all further promises to reject if one request is rejected
  • So we have migrated to Promise.allSettled() which allows to wait for all requests to complete instead of immediately rejecting
  • Along with this new fallbacks have been added to allow data to be shown even if there is an error while fetching the data
  • Scope of fallback scenario
    • When the API is unreachable, say because of an API timeout or the backend is not able to respond
    • When the network might go down and the call to the backend fails
    • If the API returns any error (like 500, or 404 etc.) where the data is not present

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11150

How was this patch tested?

Patch was tested manually

Screenshot Description
Screenshot 2024-07-15 at 21 56 03 Fallback when clusterState request fails
Screenshot 2024-07-18 at 12 35 26 Fallback when open keys summary fails
Screenshot 2024-07-18 at 12 35 06 Fallback when deletePending keys summary fails
Screenshot 2024-07-18 at 12 36 01 Fallback when File Size Distribution API fails
Screenshot 2024-07-18 at 12 35 46 Fallback when Container Size Distribution API fails

@devabhishekpal
Copy link
Contributor Author

@ArafatKhan2198 @devmadhuu @smitajoshi12 Could you please take a look?
Also please do suggest if you have any alternatives to the ?? that I am displaying when the clusterState endpoint fails

@devabhishekpal devabhishekpal changed the title HDDS-11150. Shift Axios.all to Promise.allSettled and add fallback for error in failed requests HDDS-11150. Recon Overview page crashes due to failed API Calls Jul 15, 2024
@devmadhuu
Copy link
Contributor

@devabhishekpal Can you pls define clear scope and clear boundaries this PR will cover ?

@dombizita
Copy link
Contributor

@ArafatKhan2198 @devmadhuu @smitajoshi12 Could you please take a look? Also please do suggest if you have any alternatives to the ?? that I am displaying when the clusterState endpoint fails

I'd suggest to go with N/A, as that describes maybe better that we were not able to get the necessary data to display. Let me know what you think of this!

@devabhishekpal
Copy link
Contributor Author

Message displayed when we are hovering over the data in case of fallback.
Screenshot 2024-07-15 at 22 00 55

@smitajoshi12
Copy link
Contributor

smitajoshi12 commented Jul 16, 2024

@devabhishekpal
Can we do same fall back for Open keys Summary and Pending Deleted Keys Summary for NA instead of 0.Not able to differentiate between 0 because of API Response 0 keys or it is because of fallback.

Screenshot 2024-07-16 at 5 31 50 PM

@smitajoshi12
Copy link
Contributor

@devabhishekpal

Can we handle null check before reduce
image

@smitajoshi12
Copy link
Contributor

smitajoshi12 commented Jul 16, 2024

@devabhishekpal

Same Issue with Count Response. please check API failing test case with wrong api name and async await.

image

@devabhishekpal
Copy link
Contributor Author

Hi @smitajoshi12 , thanks for your review.
Addressed the issues you pointed out in the latest commit

@smitajoshi12
Copy link
Contributor

smitajoshi12 commented Jul 17, 2024

@devabhishekpal

This condition should be in or
If we fail one api other api is not working for Insights
resp.status === 'rejected' || resp.reason.toString().includes('CanceledError')

@devabhishekpal
Copy link
Contributor Author

devabhishekpal commented Jul 17, 2024

Hi @smitajoshi12, regarding:

@devabhishekpal

This condition should be in or If we fail one api other api is not working for Insights resp.status === 'rejected' || resp.reason.toString().includes('CanceledError')

Yes, but the reject will only happen when we completely shift to a different navigation/page.
At that point having the response for one of the endpoints is not required/not useful as the user will have already shifted to a different page

Shifting between FileSize view tab and the ContainerSize tab will not trigger the reject - so the data will get loaded properly.

@devabhishekpal
Copy link
Contributor Author

Extending on the above, if we use || it means the error would be thrown in either if it gets rejected, or if it is cancelled.
But this is not correct, as there might be scenarios where the API might reject the Promise, but the error is not cancellation.

In that scenario we do not want to throw the error, but fallback to the default values.
We want to throw the error only when the request is rejected as the call was cancelled.

@smitajoshi12
Copy link
Contributor

@ArafatKhan2198 @devmadhuu

Can you review as this pr needs to get merged in OZONE-719.3.x as we have already completed all recent changes cherry picks for ECharts and other upstream Jiras.

@smitajoshi12
Copy link
Contributor

Extending on the above, if we use || it means the error would be thrown in either if it gets rejected, or if it is cancelled. But this is not correct, as there might be scenarios where the API might reject the Promise, but the error is not cancellation.

In that scenario we do not want to throw the error, but fallback to the default values. We want to throw the error only when the request is rejected as the call was cancelled.

If one api fails because Internal server error or any other issue response is blank but UI looks this way It should catch error or throw error

image

@devabhishekpal
Copy link
Contributor Author

Thanks @smitajoshi12 for pointing this out.
Addressed this in the latest PR.

In case the request is cancelled, we go to catch.
Else we initialize with the default values and display a popup to notify the user something is going wrong.

Can you take another look?
Thanks

@smitajoshi12
Copy link
Contributor

It looks good to me

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @devabhishekpal after discussing in our offline sync the changes LGTM!

Copy link
Contributor

@smitajoshi12 smitajoshi12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes looks good to me

@ArafatKhan2198
Copy link
Contributor

Thanks for the patch @devabhishekpal and thanks for the review @smitajoshi12

@ArafatKhan2198 ArafatKhan2198 merged commit 0c924bc into apache:master Jul 19, 2024
34 checks passed
@devabhishekpal devabhishekpal deleted the HDDS-11150 branch July 19, 2024 06:44
errose28 added a commit to errose28/ozone that referenced this pull request Jul 30, 2024
…-delete

* HDDS-10239-container-reconciliation: (184 commits)
  HDDS-10373. Implement framework for capturing Merkle Tree Metrics. (apache#6864)
  HDDS-11188. Initial setup for new UI layout and enable users to switch to new UI (apache#6953)
  HDDS-11120. Rich rebalancing status info (apache#6911)
  HDDS-11187. Fix Event Handling in Recon OMDBUpdatesHandler to Prevent ClassCastException. (apache#6950)
  HDDS-11213. Bump commons-daemon to 1.4.0 (apache#6971)
  HDDS-11212. Bump commons-net to 3.11.1 (apache#6973)
  HDDS-11211. Bump assertj-core to 3.26.3 (apache#6972)
  HDDS-11210. Bump log4j2 to 2.23.1 (apache#6970)
  HDDS-11150. Recon Overview page crashes due to failed API Calls (apache#6944)
  HDDS-11183. Keys from DeletedTable and DeletedDirTable of AOS should be deleted on batch operation while creating a snapshot (apache#6946)
  HDDS-11198. Fix Typescript configs for Recon (apache#6961)
  HDDS-11180. Simplify HttpServer2#inferMimeType return statement (apache#6963)
  HDDS-11194. OM missing audit log for upgrade (apache#6958)
  HDDS-10389. Implement a search feature for users to locate open keys within the Open Keys Insights section. (apache#6231)
  HDDS-10561. Dashboard for delete key metrics (apache#6948)
  HDDS-11192. Increase SPNEGO URL test coverage (apache#6956)
  HDDS-11179. DBConfigFromFile#readFromFile result of toIOException not thrown (apache#6957)
  HDDS-11186. First container log missing from bundle (apache#6952)
  HDDS-10844. Clarify snapshot create error message. (apache#6955)
  HDDS-11166. Switch to Rocky Linux-based ozone-runner (apache#6942)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants