Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rehydrate last dumps by cluster #226

Merged
merged 22 commits into from
Jul 24, 2024
Merged

Rehydrate last dumps by cluster #226

merged 22 commits into from
Jul 24, 2024

Conversation

jt-dd
Copy link
Contributor

@jt-dd jt-dd commented Jul 18, 2024

Adding a new grpc method to rehydrate dumps in KHaaS from previous dumps from a bucket. It will only rehydrate the latest dump (latest as last file created in the bucket by cluster).

On more command has been rehydrate added on the grpc client :

./bin/build/kubehound ingest remote rehydrate --khaas-server 127.0.0.1:9000 --insecure

It can also be run manually with grpcurl:

grpcurl -plaintext -format text 127.0.0.1:9000 grpc.API.RehydrateLates

Copy link
Contributor

@Minosity-VR Minosity-VR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits but lgtm, testing locally before approving

pkg/ingestor/api/api.go Show resolved Hide resolved
pkg/ingestor/api/api.go Outdated Show resolved Hide resolved
pkg/ingestor/api/api.go Outdated Show resolved Hide resolved
pkg/ingestor/puller/blob/blob.go Outdated Show resolved Hide resolved
@@ -59,6 +59,27 @@ var (
return core.CoreClientGRPCIngest(khCfg.Ingestor, khCfg.Ingestor.ClusterName, khCfg.Ingestor.RunID)
},
}

remoteIngestRehydrateCmd = &cobra.Command{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UX question:
Woudn't it be better to have a flag like --latest or even just latest on the ingest subcommand?

rehydrate is "another command to know" instead of "wait, I just want to ingest whatever the latest thing is but don't remember the exact path, can I just use latest and be done with it?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but the ingest require the --run_id and --cluster which is not needed, so this is why I split it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then a couple possible solution that, imo would be better, if you don't want to spend time on it, i'm fine, but that'll be harder to change later I guess :D

I think it'd be better to have:

kh ingest remote latest # (or even, maybe without latest since that could be default value?) This also allows for autocomplete
kh ingest remote run_xyz --cluster xyz
kh ingest remote latest --cluster xyz

# also allow to ingest, depending on the env, cluster that matters to the user, individually
kh ingest remote latest --cluster xyz --cluster abc 
(or, possibly)
kh ingest remote latest --cluster xyz,abc 

Or, even to keep it more aligned with the current run_id flag system, just checking if run_id==latest would work
(I think it would prefer the non flag version, because the auto completion would be simpler / expected, but that's a bit more work, so i'm find with the proposal below for now)

kh ingest remote --run_id=latest
kh ingest remote --run_id=run_xyz --cluster xyz
kh ingest remote --run_id=latest --cluster xyz

The current PR adds, imo, confusion about what to run and makes the feature less discover-able

kh ingest remote rehydrate # (how do you say you want to get specific cluster? e.g: If you have 500 cluster dumped, and your team manages only 4, I don't think you'll want to ingest them all?)
kh ingest remote --run_id=run_xyz --cluster xyz

Rationale for the move from flags to positional arg: --cluster is now not mandatory (as in: it can be found by itself during "rehydrate") but --run_id is mandatory? (as in: it's either you provide it or set for you but there's one), so it feels weird to have the two

@@ -52,6 +56,59 @@ func NewIngestorAPI(cfg *config.KubehoundConfig, puller puller.DataPuller, notif
}
}

// RehydrateLatest is just a GRPC wrapper around the Ingest method from the API package
func (g *IngestorAPI) RehydrateLatest(ctx context.Context) ([]*grpc.IngestedCluster, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From on design PoV, I think it would be better to have this function just do:

  • call a function that get []string of the list of the latest dump
  • loop on it to call ingest()

This would need to move most of this to a funciton like LatestDumpByCluster, maybe in the puller package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already kind of the case (listing the files, and then iterating over).
The only case I see where it could be useful would be if we want to rehydrate specific clusters, but it will require to add news entries more work which is not needed for the moment. This new function is mainly to "reload" when the graph reboots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm advocating for a split of the function in 2:

  • part 1 being the "extract the latest clusters"
  • part 2: call the api based on the results of that

Copy link
Contributor

@Minosity-VR Minosity-VR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few nits, and a question on path handling in tests: sometimes you use a full path obtained from the os package, sometimes a ./testdata..., sometimes a func getTempDir. I wonder if we could harmonize all of that

pkg/dump/result.go Outdated Show resolved Hide resolved
pkg/dump/result_test.go Outdated Show resolved Hide resolved
pkg/ingestor/puller/blob/testdata/archive.tar.gz Outdated Show resolved Hide resolved
pkg/ingestor/puller/blob/blob_test.go Show resolved Hide resolved
pkg/ingestor/puller/blob/blob_test.go Show resolved Hide resolved
pkg/ingestor/puller/blob/blob_test.go Show resolved Hide resolved
@jt-dd jt-dd marked this pull request as ready for review July 24, 2024 09:25
@jt-dd jt-dd requested a review from a team as a code owner July 24, 2024 09:25
Minosity-VR
Minosity-VR previously approved these changes Jul 24, 2024
Copy link
Contributor

@Minosity-VR Minosity-VR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for the issues I noted earlier. Approving but please wait for Edouard to resolve his comments :D

@jt-dd jt-dd merged commit 3e5e8f5 into main Jul 24, 2024
7 checks passed
@jt-dd jt-dd deleted the jt-dd/grpc-rehydratelatest branch July 24, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants