feat(dashboard): automated analysis dashboard card #618

maxcao13 · 2022-11-09T01:07:41Z

Depends on https://github.com/cryostatio/cryostat/pull/1243
Fixes #608

andrewazores

This is looking super cool. I have some minor feedback so far about the implementation itself.

src/app/Dashboard/AutomatedAnalysisCard.tsx

src/app/Dashboard/ClickableAutomatedAnalysisLabel.tsx

andrewazores · 2022-11-09T14:27:16Z

I wonder what to do, or what to show to the user, in the case where there are no recordings to source data from. Should the card just show an empty content? Maybe it should explain why it is empty and provide a link to the Recordings view to prompt the user to start a recording?

What if the target has no running active recordings but it does have a recent archive? What if this is a Cryostat Agent target that we eventually have implemented, and so perhaps we cannot even start active recordings or maybe even can't list active recordings, but it may have new archives that appear over time as it pushes them?

Would it make sense to have a "source" that can be selected for the card to switch between taking snapshots vs using the latest archive?

maxcao13 · 2022-11-09T18:44:34Z

I wonder what to do, or what to show to the user, in the case where there are no recordings to source data from. Should the card just show an empty content? Maybe it should explain why it is empty and provide a link to the Recordings view to prompt the user to start a recording?

Some good points. My first impression is that if there are no source recordings on the target, there should be an explanation why we couldn't perform automated analysis, and then maybe have a prompt or button that allows Cryostat to start a profiling recording directly from the card itself without having to move, and then automatically, perform a snapshot, parse the report etc. (also maybe having a second option in the prompt to create the recording manually if the user wants to do so). Maybe also, there can be a customizable setting in the Settings, where the use can turn on and off whether recordings should be automatically started on targets for automated analysis, (during startup or when rendering the dashboard?), so that the user doesn't have to manually create a recording every time.
EDIT: I just realized that I am describing an Automated Rule.

What if the target has no running active recordings but it does have a recent archive? What if this is a Cryostat Agent target that we eventually have implemented, and so perhaps we cannot even start active recordings or maybe even can't list active recordings, but it may have new archives that appear over time as it pushes them?

Would it make sense to have a "source" that can be selected for the card to switch between taking snapshots vs using the latest archive?

First impression is that if an active recording isn't currently started, then maybe the Card will fallback to check if there are any archived recordings listed. If there are, then having the card acknowledge that it is using an Archived Recording to generate a report on, and also having a timestamp or "Seconds since stale" increasing timer number along with the analysis may be useful for the user to check how stale the analysis is?

I'm not sure about how the source idea would go; I don't if it may be confusing to people but if you think people may want that fine-grained control over their analysis, then it can definitely be worked in.

andrewazores · 2022-11-09T19:00:29Z

EDIT: I just realized that I am describing an Automated Rule.

:-)

I thought about that, too. It might make sense to point the user to creating an Automated Rule if there are no source recordings for the card to draw from, rather than having them create a recording manually. I think Automated Rules fit better with the story about continuous monitoring, which is also what the Dashboard should be about, whereas a custom recording created manually is more of a deepdive profiling task.

First impression is that if an active recording isn't currently started, then maybe the Card will fallback to check if there are any archived recordings listed. If there are, then having the card acknowledge that it is using an Archived Recording to generate a report on, and also having a timestamp or "Seconds since stale" increasing timer number along with the analysis may be useful for the user to check how stale the analysis is?

I'm not sure about how the source idea would go; I don't if it may be confusing to people but if you think people may want that fine-grained control over their analysis, then it can definitely be worked in.

I think your idea makes a lot of sense, actually. Given that this is the Dashboard and meant to just give a quick overview of things, the "staleness" on the card seems like a really good compromise. The user can decide what to do from there or interpret that information according to the extra context they might know about the target or that data.

maxcao13 · 2022-11-09T19:48:54Z

Adding on to that, it may make sense to cache reports within the web-client and also maybe use the most recently cached report for a target in-case maybe a report generation or snapshot fails upon re-rendering. Then in that case, the staleness should also probably be used.

maxcao13 · 2022-11-15T23:37:01Z

Ready for review, except for tests, which I will do after the OK.

There are issues with how snapshots seem to be flakey to work sometimes, and this results in snapshot recordings that are created under the target that don't seem to be useful. Also, another problem that happens is that eventually, if a recording has gone on a while and a snapshot is taken and sent to the /reports sidecar, there will be a ReportGeneration error because the Request Entity is supposedly too large. I assume these both are backend issues.

andrewazores · 2022-11-17T02:55:43Z

Those do sound like backend issues.

There are issues with how snapshots seem to be flakey to work sometimes, and this results in snapshot recordings that are created under the target that don't seem to be useful.

Any more details on the flakiness or how the snapshots aren't useful?

Also, another problem that happens is that eventually, if a recording has gone on a while and a snapshot is taken and sent to the /reports sidecar, there will be a ReportGeneration error because the Request Entity is supposedly too large.

This could just be a report sidecar resource constraints configuration issue, or it could be an issue with the configuration of the active recording on the target being observed. What recording settings were used for that? Maybe it needs a slimmer event template, or a maxAge/maxSize.

github-actions · 2022-11-17T17:41:52Z

This PR/issue depends on:

cryostatio/cryostat#1243
By Dependent Issues (🤖). Happy coding!

maxcao13 · 2022-11-17T17:50:40Z

I'm not sure when it happens, but here is an example I just did by just repeatedly taking snapshots until it happens:

As you can see the recording is called "Snapshot", and if this occurs more than once, there will be multiple recordings called "Snapshot" seemingly existing at once within the Target.

As for the request entity problem, I generate snapshots using this recording as a basis:

  name: 'automated-analysis',
  events: 'template=Continuous,type=TARGET',
  duration: undefined,
  archiveOnStop: false,
  options: {
    toDisk: true,
    maxAge: 0,
    maxSize: 0,
  },
  metadata: {
    labels: {
      createdFrom: 'automatedAnalysis',
    },
  },

A maxSize constraint makes the most sense here.

andrewazores · 2022-11-17T18:26:34Z

As you can see the recording is called "Snapshot", and if this occurs more than once, there will be multiple recordings called "Snapshot" seemingly existing at once within the Target.

Weird.

JFR doesn't use recording names as unique identifiers, they have incrementing numeric IDs that are used instead. The unique name thing is a Cryostat application restriction and we hide the numeric IDs. When you ask the target JVM to take a snapshot it could very well create snapshots with the same repeated name but distinct IDs. I don't know why it would just be creating one called Snapshot instead of snapshot-n, though.

But, do we really need to take snapshots anymore for this usecase? If there is already a recording running, or if one is started by clicking that text/button on the card, then that recording can be used for the report generation request. Taking a snapshot in this case would only help if there was some extra work done to limit the age of data within the snapshot for example, but we don't really have a good story for support that yet anyway. For now I think it would make sense to simply check for a recording with a specific name, like automated-analysis - or even use GraphQL to query by that label you've added instead of looking for a name - and simply try to generate reports from that. If the user clicked that text/button to start that recording from the card then apply settings with a maxAge and the Continuous event template.

maxcao13 · 2022-11-17T18:29:13Z

But, do we really need to take snapshots anymore for this usecase? If there is already a recording running, or if one is started by clicking that text/button on the card, then that recording can be used for the report generation request. Taking a snapshot in this case would only help if there was some extra work done to limit the age of data within the snapshot for example, but we don't really have a good story for support that yet anyway. For now I think it would make sense to simply check for a recording with a specific name, like automated-analysis - or even use GraphQL to query by that label you've added instead of looking for a name - and simply try to generate reports from that. If the user clicked that text/button to start that recording from the card then apply settings with a maxAge and the Continuous event template.

Makes sense, probably don't need snapshots if an 'automated-analysis' recording acts as a profiling recording.

andrewazores · 2022-11-17T18:29:33Z

Anyway, the card itself looks great and generally behaves as expected other than the snapshotting behaviour. I really like the filters, especially by score severity.

One question/note, would it make sense to "collapse" NA scores into the n more labels? Or maybe have a default filter applied to not display those?

maxcao13 · 2022-11-17T18:31:10Z

Anyway, the card itself looks great and generally behaves as expected other than the snapshotting behaviour. I really like the filters, especially by score severity.

One question/note, would it make sense to "collapse" NA scores into the n more labels? Or maybe have a default filter applied to not display those?

Yeah, I was thinking about that, thanks for reminding me. I think I will go with just defaultly not showing N/A scores.

src/app/CreateRecording/CreateRecording.tsx

src/app/Shared/Redux/ReduxStore.tsx

src/app/Shared/Redux/AutomatedAnalysisFilterReducer.tsx

maxcao13 · 2022-11-17T19:48:20Z

Makes sense, probably don't need snapshots if an 'automated-analysis' recording acts as a profiling recording.

Mmm... I'm wondering one thing. If we just take report generation repeatedly from a recording, let's say from a fixed recording called 'automated-analysis' with a max size of something like 1MB, is the recording report representative of the current state of the target and previous events? Because if there is a max-size, if I recall correctly, the recording will chop off the previous recording timeline data to fit the max-size constraint?

andrewazores · 2022-11-17T19:59:28Z

That's true, the maxSize/maxAge settings will both cause older events to be dropped from the recording, and therefore they won't be used when calculating the automated analysis scores. So if something noteworthy happened to your application half an hour ago but your recording only preserves the last 5 minutes of data, you would never be able to learn about that event that happened half an hour ago.

This is why I think we need to expand the card with some configuration, eventually, that lets the user choose which event template to use and also values for maxSize/maxAge. We don't know how much traffic the target gets and how many events will be emitted, and we don't know how large or how many reports sidecar generators the user will provision, so there are some things that are unknown to us but hopefully the user can figure out and configure.

andrewazores · 2022-11-17T20:03:04Z

Side-benefit of using the active recording (by name or label) directly rather than taking snapshots is that it means this card will actually serve reports backed by Cryostat's active report cache. Using snapshots this way actually circumvents the cache entirely even though the data may be unlikely to have changed, or changed significantly.

@maxcao13 on your backend support PR, could you touch up the active report cache so that the expiry and refresh times are configured by environment variable? I think in theory we would want to tie the lifecycle of the cache entries to the configured maxAge for the recording, but that's a level of integration that we just don't have right now and would take a lot of refactoring to get to. Once that's done then it's another set of configuration options for the Operator to add to the Cryostat CR, and I guess something else for the Helm chart to support as well.

…ing reports

andrewazores

Testing looks good to me, skimming the code looks good too. @tthvo seems to have already reviewed the implementation quite thoroughly.

github-actions · 2023-01-10T16:31:43Z

Test image available:

CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat-web:pr-618-4f650e4789bf88a5ff4d3a3808eba021322f1cc8 sh smoketest.sh

maxcao13 added the feat New feature or request label Nov 9, 2022

maxcao13 requested review from andrewazores and tthvo November 9, 2022 01:07

andrewazores reviewed Nov 9, 2022

View reviewed changes

maxcao13 marked this pull request as draft November 9, 2022 18:31

maxcao13 force-pushed the automated-analysis-dashboard branch from d2541e5 to 96d9e41 Compare November 10, 2022 01:38

maxcao13 mentioned this pull request Nov 10, 2022

feat(dashboard): add backend support to automated analysis card embeded in dashboard on -web cryostatio/cryostat-legacy#1243

Merged

maxcao13 force-pushed the automated-analysis-dashboard branch 2 times, most recently from 4e4b903 to 6c819b0 Compare November 15, 2022 20:12

maxcao13 marked this pull request as ready for review November 15, 2022 23:36

github-actions bot added the dependent label Nov 17, 2022

tthvo reviewed Nov 17, 2022

View reviewed changes

src/app/CreateRecording/CreateRecording.tsx Outdated Show resolved Hide resolved

tthvo reviewed Nov 17, 2022

View reviewed changes

src/app/Shared/Redux/ReduxStore.tsx Outdated Show resolved Hide resolved

tthvo reviewed Nov 17, 2022

View reviewed changes

src/app/Shared/Redux/AutomatedAnalysisFilterReducer.tsx Outdated Show resolved Hide resolved

tthvo reviewed Nov 17, 2022

View reviewed changes

src/app/Shared/Redux/AutomatedAnalysisFilterReducer.tsx Outdated Show resolved Hide resolved

maxcao13 added 21 commits December 8, 2022 13:59

remove comment

713df88

use currentScore and not setState value

9701138

assert sliderStep type and change score input size to 5em

68f047a

add type for tuple

76a0970

remove useless prop

27b8152

separate set state with array sorting

596475d

refactor context target observable subscription handling with generat…

91ee8d3

…ing reports

add auth retry error handling

9639808

better error handling

57eb4cf

use own css rules for truncating text

8f28d36

fix selectTemplateSelectorForm conflict, change auth handling

5426894

moved timer to utils, fixed loading

1bb5fb1

css tricks

06c26b8

fix everything

8074ce9

format:applied

2b7cede

fix caching, fix other various things

6ff2c4c

some tests, just filter tests and clickable label test left

3b34dd2

helper text for templates, some loading state bug with config drawer

db4cefe

fix popper mounting bug

64e0ecc

fixed everything including tests

ed014d1

rebase conflicts, remove imports

a69af48

maxcao13 force-pushed the automated-analysis-dashboard branch from 86e9d9e to a69af48 Compare December 8, 2022 19:01

yarn format:apply

4f650e4

andrewazores approved these changes Dec 8, 2022

View reviewed changes

andrewazores mentioned this pull request Dec 8, 2022

[Story] Dashboard contents/layout should be configurable #727

Closed

7 tasks

andrewazores merged commit f8ae71e into cryostatio:main Dec 8, 2022

maxcao13 deleted the automated-analysis-dashboard branch December 8, 2022 19:32

andrewazores added the needs-documentation label Jan 10, 2023

mergify bot added the safe-to-test label Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dashboard): automated analysis dashboard card #618

feat(dashboard): automated analysis dashboard card #618

maxcao13 commented Nov 9, 2022 •

edited by andrewazores

Loading

andrewazores left a comment

andrewazores commented Nov 9, 2022

maxcao13 commented Nov 9, 2022 •

edited

Loading

andrewazores commented Nov 9, 2022

maxcao13 commented Nov 9, 2022 •

edited

Loading

maxcao13 commented Nov 15, 2022 •

edited

Loading

andrewazores commented Nov 17, 2022

github-actions bot commented Nov 17, 2022

maxcao13 commented Nov 17, 2022 •

edited

Loading

andrewazores commented Nov 17, 2022

maxcao13 commented Nov 17, 2022 •

edited

Loading

andrewazores commented Nov 17, 2022

maxcao13 commented Nov 17, 2022

maxcao13 commented Nov 17, 2022

andrewazores commented Nov 17, 2022

andrewazores commented Nov 17, 2022 •

edited

Loading

andrewazores left a comment

github-actions bot commented Jan 10, 2023

feat(dashboard): automated analysis dashboard card #618

feat(dashboard): automated analysis dashboard card #618

Conversation

maxcao13 commented Nov 9, 2022 • edited by andrewazores Loading

andrewazores left a comment

Choose a reason for hiding this comment

andrewazores commented Nov 9, 2022

maxcao13 commented Nov 9, 2022 • edited Loading

andrewazores commented Nov 9, 2022

maxcao13 commented Nov 9, 2022 • edited Loading

maxcao13 commented Nov 15, 2022 • edited Loading

andrewazores commented Nov 17, 2022

github-actions bot commented Nov 17, 2022

maxcao13 commented Nov 17, 2022 • edited Loading

andrewazores commented Nov 17, 2022

maxcao13 commented Nov 17, 2022 • edited Loading

andrewazores commented Nov 17, 2022

maxcao13 commented Nov 17, 2022

maxcao13 commented Nov 17, 2022

andrewazores commented Nov 17, 2022

andrewazores commented Nov 17, 2022 • edited Loading

andrewazores left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 10, 2023

maxcao13 commented Nov 9, 2022 •

edited by andrewazores

Loading

maxcao13 commented Nov 9, 2022 •

edited

Loading

maxcao13 commented Nov 9, 2022 •

edited

Loading

maxcao13 commented Nov 15, 2022 •

edited

Loading

maxcao13 commented Nov 17, 2022 •

edited

Loading

maxcao13 commented Nov 17, 2022 •

edited

Loading

andrewazores commented Nov 17, 2022 •

edited

Loading