Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic recording rules #356

Closed
wants to merge 73 commits into from

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Dec 10, 2020

Based on top of #348

#355

Requires cryostatio/cryostat-core#53

This PR has become massive, so from this point I will only add tests or bugfixes. The API needs further refinement, but I think this should either be done as follow-up PRs or as PRs against this one to incorporate the additional changes without further inflating this PR on its own. The review overhead and complexity is already relatively enormous.

I have pushed an image for testing at quay.io/andrewazores/container-jfr:2.0.0-autorules - ie CONTAINER_JFR_IMAGE=quay.io/andrewazores/container-jfr:2.0.0-autorules sh smoketest.sh.

Several API handlers are added:

  1. GET /api/v2/rules - get the list of currently defined rules
  2. GET /api/v2/rules/:ruleName - get the definition of a single rule by name
  3. POST /api/v2/rules - add a new rule definition, either by multipart form or JSON body string format
  4. DELETE /api/v2/rules/:ruleName - delete a rule by name
  5. POST /api/v2/targets/:targetId/credentials - add stored (in-memory by default, on disk implemented but currently can't be enabled intentionally) credentials to apply when processing auto rules
  6. DELETE /api/v2/targets/:targetId/credentials - delete stored credentials

There are also various supporting classes behind these:

  • CredentialsManager - used by the TargetCredentials handlers, requiring changes to the Credentials structure provided in -core. This handles reading/writing persistent credentials to/from disk and maintains an in-memory cache that the RuleProcessor can access as needed.
  • RuleRegistry - handles reading/writing/caching rule definitions (always persisted to disk) and providing methods to find definitions by name or by applicable target
  • RuleProcessor - receives notifications of targets appearing or disappearing. When a target appears the RuleRegistry is checked for any applicable rules (currently meaning that the newly appeared target has an alias that is exactly equal to the targetAlias field of the rule definition). Each applicable rule is applied, starting a new recording with options as specified by the Rule definition. If the Rule specification requires it, a PeriodicArchiver is also created corresponding to the Rule/TargetId combination, which is scheduled in a thread pool to run at a Rule-definition-specified interval and create archived copies of the automatically created recording. When a target disappears then any PeriodicArchivers associated with any of its matched rules are cancelled.
  • PeriodicArchiver - simply a Runnable which handles creating archived copies from the automatically created recording from a Rule definition within a specific target, and then handles pruning any old archive copies if the Rule definition requires it.
  • Rule - a simple data structure defining the trigger for recording creation (ie matching target alias, currently), which events/template are used for the created recording, if/when archival is performed, etc.

rulestest.sh is also included as a temporary script for testing/demoing the new APIs.

@andrewazores andrewazores force-pushed the auto-rules branch 3 times, most recently from 305ceca to 64fc283 Compare December 14, 2020 21:34
@andrewazores
Copy link
Member Author

@vic-ma @Alexjsenn @ebaron

This isn't ready for a full-blown code review yet, but I think it's ready enough for some proof-of-concept testing to exercise the basic API setup and see if it makes sense to continue developing in the direction I'm currently taking it in. When you all get a chance, could you try it out?

@vic-ma
Copy link
Contributor

vic-ma commented Jan 8, 2021

Tried it out and it looks good! No criticisms that I can think of right now.

@andrewazores andrewazores marked this pull request as ready for review January 25, 2021 18:16
@andrewazores andrewazores changed the title Proof-of-Concept - Automatic recording rules Automatic recording rules Jan 25, 2021
@andrewazores
Copy link
Member Author

I have polished this up somewhat, resolved some TODOs/FIXMEs I had added earlier in the proof-of-concept, and added some unit tests (more coverage for failure cases could be added) as well as created an integration test equivalent to the rulestest.sh. I think this is ready enough for review now.

@ebaron @vic-ma @Alexjsenn @jiekang any takers?

@vic-ma
Copy link
Contributor

vic-ma commented Jan 25, 2021

I tested it earlier, so I guess I'll also do the review heh.

@vic-ma vic-ma self-requested a review January 25, 2021 18:57
@vic-ma
Copy link
Contributor

vic-ma commented Feb 5, 2021

Should this PR include the necessary docs changes, or will you do that in a seperate PR?

@andrewazores
Copy link
Member Author

andrewazores commented Feb 5, 2021

Should this PR include the necessary docs changes, or will you do that in a seperate PR?

I think I'll hold off on documenting the new API stuff until it's a bit more settled. There are a lot of additions to be made, and along the way there are probably going to be changes to what new API is already here. I'll file a tracking issue so it isn't forgotten.

#413

rulestest.sh Show resolved Hide resolved
@vic-ma
Copy link
Contributor

vic-ma commented Feb 10, 2021

Here are some observations and suggestions I came up with from my testing. Mostly just small stuff that fall under the category of potential API refinements that should be done as follow-up PRs, which you mentioned. If there aren't any changes you think are appropriate for this PR, I can approve it. In proper use cases everything seems to be working well.

POST /api/v2/rules

When a rule with the same name as an existing rule is created, it will overwrite that rule. Maybe this should be prevented and the user should have to delete the existing rule first. Or there could be a note in the response that says that a rule was overwritten.

The numerical parameters don't give an error if they are set to negative numbers. They get set to zero when this happens, but I think ideally negative numbers shouldn't be allowed. This would also more in line with TargetRecordingsPostHandler, which rejects negatives.

When a required parameter is missing, the response is a 500 with "data":{"reason":null}, resulting from an NPE. It would be helpful if the code explicitly checked the given parameters, and let the user know if any were missing.

Some stuff, like setting the numerical parameters to letters, or setting name to !- or ////////, will cause exceptions, and the response doesn't specifically say that X parameter was set to an invalid value. But I guess this is sort of intentional, right? Since the alternative would be to check every parameter and write the error messages into the response, which would take a lot of extra code.

DELETE /api/v2/rules/:ruleName

Deleting a non-existent rule, as well as an existing one, gives a 200. It would be helpful if non-existent-rule deletions gave a 404 instead, to indicate to the user that the given rule was not found and that no deletion has occurred. Or they could make a note of it in the result, if a 404 is inappropriate here.

If a rule is already running, and the user would like to stop that rule (delete the active recording and stop the PeriodicArchiver), is there any way for them to do this besides killing the target container? Should deleting a rule maybe be able to do this? Or maybe this could be the job of a new PATCH STOP handler?

POST /api/v2/targets/:targetId/credentials

When both the username and password are missing, the error message only says that the username is missing. It would be better if both missing parameters were mentioned.

DELETE /api/v2/targets/:targetId/credentials

Maybe the response could specify if credentials were found and deleted, or if there were no credentials to be deleted? Not super important, but might be helpful.

When a rule for a target is running, and I delete the credentials for that target, the archiver is still able to archive recordings, despite the fact that JMX auth is required for this, and the saved JMX auth credentials have been deleted. I see that this is because in the code, the credentials are given to the PeriodicArchiver in the constructor and saved to the object, not retreived from the CredentialsManager each time it is used. This is more efficient, but I'm wondering if it makes sense from the user's point of view. If they deleted some credentials, would they expect for CJFR to not have a copy of those credentials stored anywhere, anymore? Or at least not be able to use it?

General

Deleting the recording created by an automatic rule causes an exception whenever the archiver tries to archive the recording (expectedly). I imagine there's no good way to protect that recording from being deleting, but maybe there is a way to stop the archiver if the recording does get deleted? It does seem unlikely that a user would accidentally delete the automatic recording, though.

When a target that has a rule running is killed, I get a few exceptions from the archiver trying to archive the recording, before the target is discovered as LOST and the archiver is stopped. This is because I set a low archival period for testing, but obviously could also happen with a longer period, if the archiver just so happens to archive in the in-between time. Also, when this happens, an archived recording will be pruned, but a new one will not be made to replace it, so the end result is one less archived recording than the amount specified by preservedArchives. I don't know how big of a deal any of this is.

@andrewazores
Copy link
Member Author

@vic-ma those are all very good and valid points. I think most/all of those should be addressed before this makes it into main.

I think what I will do at this point is push my auto-rules branch a new upstream branch with the same name, and open a draft PR for merging the upstream auto-rules into upstream main. Then, each of the review points you've outlined can be handled as individual PRs against upstream auto-rules. This way each change can be much more easily review in isolation, by yourself or someone else. This also makes it much simpler for anyone else to contribute fixes to the draft PR as well.

@andrewazores
Copy link
Member Author

andrewazores commented Feb 12, 2021

Superceded by #416 .

@andrewazores andrewazores deleted the auto-rules branch February 12, 2021 15:36
@andrewazores andrewazores mentioned this pull request Feb 12, 2021
11 tasks
aali309 pushed a commit to aali309/cryostat-legacy that referenced this pull request Jul 22, 2024
…er (cryostatio#356)

* fix(matchexpressions): correct endpoint case for JSON ID request filter

* allow IDs in per-rule request paths

do not allow IDs in rule creation POSTS, but do allow in modification PATCHes - the ID field will be ignored here anyway
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants