Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Origin UI RateGraph data loss/issues #497

Merged
merged 8 commits into from
Jan 24, 2024

Conversation

haoming29
Copy link
Contributor

@haoming29 haoming29 commented Dec 13, 2023

Fix #487

It turned out that issue #487 is expected as we set the default data resolution to 3h, and at the server fresh start (no history metric data), Prometheus doesn't have any data at now() - 3h. However, this should be adjust so that the default resolution can always give some result to the user at the server start.

This PR fixes anther issue where the "reset" button sets time differently than the default value, causing inconsistent result (although in the later commits it was consolidated to update time accordingly, see details in next point)

This PR also fixes a bug where the RateGraph visualizes data from t= now() to t= now() + range (by default 3h) which means it "generates" fake data from now to 3 hours in the future and we don't want that. Instead, now the data will be from t=now() - range to t=now(), which is the history data from last <range> until now.

This PR finally addressed the issue where we use xrootd_server_bytes{direction="rx"} metric for both rx and tx data. Now the two data series will give different values (as reported by Prometheus)

@haoming29 haoming29 added this to the v7.4.0 milestone Dec 13, 2023
@haoming29 haoming29 added the bug Something isn't working label Dec 13, 2023
@haoming29 haoming29 requested a review from CannonLock December 13, 2023 21:44
@bbockelm
Copy link
Collaborator

@CannonLock - can you take a look at this?

Copy link
Contributor

@CannonLock CannonLock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for your input on one thing before I approve.

Comment on lines 42 to 45
rate={TimeDuration.fromString("3h")}
duration={TimeDuration.fromString("7d")}
resolution={TimeDuration.fromString("3h")}
metric={['xrootd_server_bytes{direction="rx"}', 'xrootd_server_bytes{direction="rx"}']}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep these defaults. I realize they aren't the best when you first enter but for a origin that is actually in use somewhere I think they are more helpful then just todays numbers.

If you disagree, we should at least kick up the resolution a little bit, maybe to "20m"? 1440 data points on the graph is going to get busy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should think about users who run origin for days. How about I add a note to the graph to tell the user if they just started the origin and they can't see incoming data, they should modify the resolution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I can't run origin UI with latest main and I got the white screen with client side error that leads to Objects are not valid as a React child (found: object with keys {Type, Value}). If you meant to render a collection of children, use an array instead. I've rolled back the values to default but I won't be able to add the note without seeing the UI. We can talk about more in detail tomorrow.

@haoming29
Copy link
Contributor Author

haoming29 commented Jan 9, 2024

Given the limited time we have for 7.4, I will back port this in 7.4.x, so no worries if we need to get a release out soon @joereuss12

@haoming29
Copy link
Contributor Author

@CannonLock I added some instructions in the "no data" error message for how to adjust settings to get data out for fresh servers and I tested everything looks good.

@haoming29 haoming29 requested a review from CannonLock January 10, 2024 17:20
@CannonLock
Copy link
Contributor

@haoming29 When I run the origin I am getting {"error":"Correct authorization required to access Prometheus query engine APIs"}

@haoming29
Copy link
Contributor Author

@haoming29 When I run the origin I am getting {"error":"Correct authorization required to access Prometheus query engine APIs"}

Did you login when you hit this? Where did this come from?

Copy link
Contributor

@CannonLock CannonLock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug I saw was related to the director being down? LGTM.

@haoming29 haoming29 merged commit f99c328 into PelicanPlatform:main Jan 24, 2024
9 checks passed
@haoming29 haoming29 deleted the fix-origin-metric-loss branch January 24, 2024 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prometheus metric xrootd_server_bytes not updated in Origin
3 participants