-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rehydrate last dumps by cluster #226
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits but lgtm, testing locally before approving
cmd/kubehound/ingest.go
Outdated
@@ -59,6 +59,27 @@ var ( | |||
return core.CoreClientGRPCIngest(khCfg.Ingestor, khCfg.Ingestor.ClusterName, khCfg.Ingestor.RunID) | |||
}, | |||
} | |||
|
|||
remoteIngestRehydrateCmd = &cobra.Command{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UX question:
Woudn't it be better to have a flag like --latest
or even just latest
on the ingest
subcommand?
rehydrate is "another command to know" instead of "wait, I just want to ingest whatever the latest thing is but don't remember the exact path, can I just use latest and be done with it?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but the ingest require the --run_id and --cluster which is not needed, so this is why I split it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then a couple possible solution that, imo would be better, if you don't want to spend time on it, i'm fine, but that'll be harder to change later I guess :D
I think it'd be better to have:
kh ingest remote latest # (or even, maybe without latest since that could be default value?) This also allows for autocomplete
kh ingest remote run_xyz --cluster xyz
kh ingest remote latest --cluster xyz
# also allow to ingest, depending on the env, cluster that matters to the user, individually
kh ingest remote latest --cluster xyz --cluster abc
(or, possibly)
kh ingest remote latest --cluster xyz,abc
Or, even to keep it more aligned with the current run_id flag system, just checking if run_id==latest
would work
(I think it would prefer the non flag version, because the auto completion would be simpler / expected, but that's a bit more work, so i'm find with the proposal below for now)
kh ingest remote --run_id=latest
kh ingest remote --run_id=run_xyz --cluster xyz
kh ingest remote --run_id=latest --cluster xyz
The current PR adds, imo, confusion about what to run and makes the feature less discover-able
kh ingest remote rehydrate # (how do you say you want to get specific cluster? e.g: If you have 500 cluster dumped, and your team manages only 4, I don't think you'll want to ingest them all?)
kh ingest remote --run_id=run_xyz --cluster xyz
Rationale for the move from flags to positional arg: --cluster
is now not mandatory (as in: it can be found by itself during "rehydrate") but --run_id
is mandatory? (as in: it's either you provide it or set for you but there's one), so it feels weird to have the two
@@ -52,6 +56,59 @@ func NewIngestorAPI(cfg *config.KubehoundConfig, puller puller.DataPuller, notif | |||
} | |||
} | |||
|
|||
// RehydrateLatest is just a GRPC wrapper around the Ingest method from the API package | |||
func (g *IngestorAPI) RehydrateLatest(ctx context.Context) ([]*grpc.IngestedCluster, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From on design PoV, I think it would be better to have this function just do:
- call a function that get []string of the list of the latest dump
- loop on it to call ingest()
This would need to move most of this to a funciton like LatestDumpByCluster
, maybe in the puller package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already kind of the case (listing the files, and then iterating over).
The only case I see where it could be useful would be if we want to rehydrate specific clusters, but it will require to add news entries more work which is not needed for the moment. This new function is mainly to "reload" when the graph reboots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'm advocating for a split of the function in 2:
- part 1 being the "extract the latest clusters"
- part 2: call the api based on the results of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few nits, and a question on path handling in tests: sometimes you use a full path obtained from the os
package, sometimes a ./testdata...
, sometimes a func getTempDir. I wonder if we could harmonize all of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for the issues I noted earlier. Approving but please wait for Edouard to resolve his comments :D
Adding a new grpc method to rehydrate dumps in KHaaS from previous dumps from a bucket. It will only rehydrate the latest dump (latest as last file created in the bucket by cluster).
On more command has been
rehydrate
added on the grpc client :It can also be run manually with
grpcurl
: