-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dashboard): automated analysis dashboard card #618
feat(dashboard): automated analysis dashboard card #618
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking super cool. I have some minor feedback so far about the implementation itself.
I wonder what to do, or what to show to the user, in the case where there are no recordings to source data from. Should the card just show an empty content? Maybe it should explain why it is empty and provide a link to the Recordings view to prompt the user to start a recording? What if the target has no running active recordings but it does have a recent archive? What if this is a Cryostat Agent target that we eventually have implemented, and so perhaps we cannot even start active recordings or maybe even can't list active recordings, but it may have new archives that appear over time as it pushes them? Would it make sense to have a "source" that can be selected for the card to switch between taking snapshots vs using the latest archive? |
Some good points. My first impression is that if there are no source recordings on the target, there should be an explanation why we couldn't perform automated analysis, and then maybe have a prompt or button that allows Cryostat to start a profiling recording directly from the card itself without having to move, and then automatically, perform a snapshot, parse the report etc. (also maybe having a second option in the prompt to create the recording manually if the user wants to do so). Maybe also, there can be a customizable setting in the Settings, where the use can turn on and off whether recordings should be automatically started on targets for automated analysis, (during startup or when rendering the dashboard?), so that the user doesn't have to manually create a recording every time.
First impression is that if an active recording isn't currently started, then maybe the Card will fallback to check if there are any archived recordings listed. If there are, then having the card acknowledge that it is using an Archived Recording to generate a report on, and also having a timestamp or "Seconds since stale" increasing timer number along with the analysis may be useful for the user to check how stale the analysis is? I'm not sure about how the source idea would go; I don't if it may be confusing to people but if you think people may want that fine-grained control over their analysis, then it can definitely be worked in. |
:-) I thought about that, too. It might make sense to point the user to creating an Automated Rule if there are no source recordings for the card to draw from, rather than having them create a recording manually. I think Automated Rules fit better with the story about continuous monitoring, which is also what the Dashboard should be about, whereas a custom recording created manually is more of a deepdive profiling task.
I think your idea makes a lot of sense, actually. Given that this is the Dashboard and meant to just give a quick overview of things, the "staleness" on the card seems like a really good compromise. The user can decide what to do from there or interpret that information according to the extra context they might know about the target or that data. |
Adding on to that, it may make sense to cache reports within the web-client and also maybe use the most recently cached report for a target in-case maybe a report generation or snapshot fails upon re-rendering. Then in that case, the staleness should also probably be used. |
d2541e5
to
96d9e41
Compare
4e4b903
to
6c819b0
Compare
Ready for review, except for tests, which I will do after the OK. There are issues with how snapshots seem to be flakey to work sometimes, and this results in snapshot recordings that are created under the target that don't seem to be useful. Also, another problem that happens is that eventually, if a recording has gone on a while and a snapshot is taken and sent to the /reports sidecar, there will be a ReportGeneration error because the Request Entity is supposedly too large. I assume these both are backend issues. |
Those do sound like backend issues.
Any more details on the flakiness or how the snapshots aren't useful?
This could just be a report sidecar resource constraints configuration issue, or it could be an issue with the configuration of the active recording on the target being observed. What recording settings were used for that? Maybe it needs a slimmer event template, or a |
This PR/issue depends on:
|
Weird. JFR doesn't use recording names as unique identifiers, they have incrementing numeric IDs that are used instead. The unique name thing is a Cryostat application restriction and we hide the numeric IDs. When you ask the target JVM to take a snapshot it could very well create snapshots with the same repeated name but distinct IDs. I don't know why it would just be creating one called But, do we really need to take snapshots anymore for this usecase? If there is already a recording running, or if one is started by clicking that text/button on the card, then that recording can be used for the report generation request. Taking a snapshot in this case would only help if there was some extra work done to limit the age of data within the snapshot for example, but we don't really have a good story for support that yet anyway. For now I think it would make sense to simply check for a recording with a specific name, like |
Makes sense, probably don't need snapshots if an 'automated-analysis' recording acts as a profiling recording. |
Anyway, the card itself looks great and generally behaves as expected other than the snapshotting behaviour. I really like the filters, especially by score severity. One question/note, would it make sense to "collapse" NA scores into the |
Yeah, I was thinking about that, thanks for reminding me. I think I will go with just defaultly not showing N/A scores. |
Mmm... I'm wondering one thing. If we just take report generation repeatedly from a recording, let's say from a fixed recording called 'automated-analysis' with a max size of something like 1MB, is the recording report representative of the current state of the target and previous events? Because if there is a max-size, if I recall correctly, the recording will chop off the previous recording timeline data to fit the max-size constraint? |
That's true, the maxSize/maxAge settings will both cause older events to be dropped from the recording, and therefore they won't be used when calculating the automated analysis scores. So if something noteworthy happened to your application half an hour ago but your recording only preserves the last 5 minutes of data, you would never be able to learn about that event that happened half an hour ago. This is why I think we need to expand the card with some configuration, eventually, that lets the user choose which event template to use and also values for maxSize/maxAge. We don't know how much traffic the target gets and how many events will be emitted, and we don't know how large or how many reports sidecar generators the user will provision, so there are some things that are unknown to us but hopefully the user can figure out and configure. |
Side-benefit of using the active recording (by name or label) directly rather than taking snapshots is that it means this card will actually serve reports backed by Cryostat's active report cache. Using snapshots this way actually circumvents the cache entirely even though the data may be unlikely to have changed, or changed significantly. @maxcao13 on your backend support PR, could you touch up the active report cache so that the expiry and refresh times are configured by environment variable? I think in theory we would want to tie the lifecycle of the cache entries to the configured maxAge for the recording, but that's a level of integration that we just don't have right now and would take a lot of refactoring to get to. Once that's done then it's another set of configuration options for the Operator to add to the Cryostat CR, and I guess something else for the Helm chart to support as well. |
86e9d9e
to
a69af48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing looks good to me, skimming the code looks good too. @tthvo seems to have already reviewed the implementation quite thoroughly.
Test image available:
|
Depends on https://github.com/cryostatio/cryostat/pull/1243
Fixes #608