Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rehydrate last dumps by cluster #226

Merged
merged 22 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions cmd/kubehound/ingest.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,27 @@ var (
return core.CoreClientGRPCIngest(khCfg.Ingestor, khCfg.Ingestor.ClusterName, khCfg.Ingestor.RunID)
},
}

remoteIngestRehydrateCmd = &cobra.Command{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UX question:
Woudn't it be better to have a flag like --latest or even just latest on the ingest subcommand?

rehydrate is "another command to know" instead of "wait, I just want to ingest whatever the latest thing is but don't remember the exact path, can I just use latest and be done with it?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but the ingest require the --run_id and --cluster which is not needed, so this is why I split it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then a couple possible solution that, imo would be better, if you don't want to spend time on it, i'm fine, but that'll be harder to change later I guess :D

I think it'd be better to have:

kh ingest remote latest # (or even, maybe without latest since that could be default value?) This also allows for autocomplete
kh ingest remote run_xyz --cluster xyz
kh ingest remote latest --cluster xyz

# also allow to ingest, depending on the env, cluster that matters to the user, individually
kh ingest remote latest --cluster xyz --cluster abc 
(or, possibly)
kh ingest remote latest --cluster xyz,abc 

Or, even to keep it more aligned with the current run_id flag system, just checking if run_id==latest would work
(I think it would prefer the non flag version, because the auto completion would be simpler / expected, but that's a bit more work, so i'm find with the proposal below for now)

kh ingest remote --run_id=latest
kh ingest remote --run_id=run_xyz --cluster xyz
kh ingest remote --run_id=latest --cluster xyz

The current PR adds, imo, confusion about what to run and makes the feature less discover-able

kh ingest remote rehydrate # (how do you say you want to get specific cluster? e.g: If you have 500 cluster dumped, and your team manages only 4, I don't think you'll want to ingest them all?)
kh ingest remote --run_id=run_xyz --cluster xyz

Rationale for the move from flags to positional arg: --cluster is now not mandatory (as in: it can be found by itself during "rehydrate") but --run_id is mandatory? (as in: it's either you provide it or set for you but there's one), so it feels weird to have the two

Use: "rehydrate",
Short: "Rehydrate snapshot previously dumped on a KHaaS instance",
Long: `Run an rehydratation on KHaaS from a bucket to build the attack path`,
PreRunE: func(cobraCmd *cobra.Command, args []string) error {
viper.BindPFlag(config.IngestorAPIEndpoint, cobraCmd.Flags().Lookup("khaas-server")) //nolint: errcheck
viper.BindPFlag(config.IngestorAPIInsecure, cobraCmd.Flags().Lookup("insecure")) //nolint: errcheck

return cmd.InitializeKubehoundConfig(cobraCmd.Context(), "", false, true)
},
RunE: func(cobraCmd *cobra.Command, args []string) error {
// Passing the Kubehound config from viper
khCfg, err := cmd.GetConfig()
if err != nil {
return fmt.Errorf("get config: %w", err)
}

return core.CoreClientGRPCRehydrateLatest(khCfg.Ingestor)
},
}
)

func init() {
Expand All @@ -70,5 +91,7 @@ func init() {
ingestCmd.AddCommand(remoteIngestCmd)
cmd.InitRemoteIngestCmd(remoteIngestCmd, true)

remoteIngestCmd.AddCommand(remoteIngestRehydrateCmd)

rootCmd.AddCommand(ingestCmd)
}
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ require (
gocloud.dev v0.37.0
golang.org/x/exp v0.0.0-20240604190554-fc45aab8b7f8
google.golang.org/grpc v1.64.1
google.golang.org/protobuf v1.34.1
google.golang.org/protobuf v1.34.2
gopkg.in/DataDog/dd-trace-go.v1 v1.64.1
gopkg.in/yaml.v2 v2.4.0
gopkg.in/yaml.v3 v3.0.1
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -916,6 +916,8 @@ google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp0
google.golang.org/protobuf v1.27.1/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg=
google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw=
gopkg.in/DataDog/dd-trace-go.v1 v1.64.1 h1:HN/zoIV8FvrLKA1ZBkbyo4E1MnPh9hPc2Q0C/ojom3I=
gopkg.in/DataDog/dd-trace-go.v1 v1.64.1/go.mod h1:qzwVu8Qr8CqzQNw2oKEXRdD+fMnjYatjYMGE0tdCVG4=
gopkg.in/airbrake/gobrake.v2 v2.0.9/go.mod h1:/h5ZAUhDkGaJfjzjKLSjv6zCL6O0LLBxU4K+aSYdM/U=
Expand Down
4 changes: 2 additions & 2 deletions pkg/cmd/dump.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,8 @@ func InitRemoteDumpCmd(cmd *cobra.Command) {

func InitRemoteIngestCmd(cmd *cobra.Command, standalone bool) {

cmd.Flags().String("khaas-server", "", "GRPC endpoint exposed by KubeHound as a Service (KHaaS) server (e.g.: localhost:9000)")
cmd.Flags().Bool("insecure", config.DefaultIngestorAPIInsecure, "Allow insecure connection to the KHaaS server grpc endpoint")
cmd.PersistentFlags().String("khaas-server", "", "GRPC endpoint exposed by KubeHound as a Service (KHaaS) server (e.g.: localhost:9000)")
cmd.PersistentFlags().Bool("insecure", config.DefaultIngestorAPIInsecure, "Allow insecure connection to the KHaaS server grpc endpoint")

// IngestorAPIEndpoint
if standalone {
Expand Down
31 changes: 9 additions & 22 deletions pkg/dump/ingestor.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ package dump
import (
"context"
"fmt"
"path"

"github.com/DataDog/KubeHound/pkg/collector"
"github.com/DataDog/KubeHound/pkg/config"
Expand All @@ -15,42 +14,30 @@ import (
)

type DumpIngestor struct {
directoryOutput string
ResultName string
collector collector.CollectorClient
writer writer.DumperWriter
collector collector.CollectorClient
writer writer.DumperWriter
}

const (
OfflineDumpDateFormat = "2006-01-02-15-04-05"
OfflineDumpPrefix = "kubehound_"
)

// ./<clusterName>/kubehound_<clusterName>_<run_id>
func DumpIngestorResultName(clusterName string, runID string) string {
return path.Join(clusterName, fmt.Sprintf("%s%s_%s", OfflineDumpPrefix, clusterName, runID))
}

// func NewDumpIngestor(ctx context.Context, collector collector.CollectorClient, compression bool, directoryOutput string) (*DumpIngestor, error) {
func NewDumpIngestor(ctx context.Context, collector collector.CollectorClient, compression bool, directoryOutput string, runID *config.RunID) (*DumpIngestor, error) {
// Generate path for the dump
clusterName, err := getClusterName(ctx, collector)
if err != nil {
return nil, err
}

resultName := DumpIngestorResultName(clusterName, runID.String())
dumpResult, err := NewDumpResult(clusterName, runID.String(), compression)
if err != nil {
return nil, fmt.Errorf("create dump result: %w", err)
}

dumpWriter, err := writer.DumperWriterFactory(ctx, compression, directoryOutput, resultName)
dumpWriter, err := writer.DumperWriterFactory(ctx, compression, directoryOutput, dumpResult.GetFullPath())
if err != nil {
return nil, fmt.Errorf("create collector writer: %w", err)
}

return &DumpIngestor{
directoryOutput: directoryOutput,
collector: collector,
writer: dumpWriter,
ResultName: resultName,
collector: collector,
writer: dumpWriter,
}, nil
}

Expand Down
16 changes: 5 additions & 11 deletions pkg/dump/ingestor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ const (
)

func TestNewDumpIngestor(t *testing.T) {
t.Parallel()
ctx := context.Background()

t.Setenv("KUBECONFIG", "./testdata/kube-config")
clientset := fake.NewSimpleClientset()
collectorClient := collector.NewTestK8sAPICollector(ctx, clientset)

Expand All @@ -48,8 +48,6 @@ func TestNewDumpIngestor(t *testing.T) {
runID: config.NewRunID(),
},
want: &DumpIngestor{
directoryOutput: mockDirectoryOutput,

writer: &writer.FileWriter{},
},
wantErr: false,
Expand All @@ -63,16 +61,16 @@ func TestNewDumpIngestor(t *testing.T) {
runID: config.NewRunID(),
},
want: &DumpIngestor{
directoryOutput: mockDirectoryOutput,
writer: &writer.TarWriter{},
writer: &writer.TarWriter{},
},
wantErr: false,
},
}
for _, tt := range tests {
// Can not run parallel tests as the environment variable KUBECONFIG is set
// t.Setenv is not compatible with parallel tests
for _, tt := range tests { //nolint:paralleltest
tt := tt
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got, err := NewDumpIngestor(ctx, tt.args.collectorClient, tt.args.compression, tt.args.directoryOutput, tt.args.runID)
if (err != nil) != tt.wantErr {
t.Errorf("NewDumpIngestorsss() error = %v, wantErr %v", err, tt.wantErr)
Expand All @@ -83,10 +81,6 @@ func TestNewDumpIngestor(t *testing.T) {
if !assert.Equal(t, reflect.TypeOf(got.writer), reflect.TypeOf(tt.want.writer)) {
t.Errorf("NewDumpIngestor() = %v, want %v", reflect.TypeOf(got.writer), reflect.TypeOf(tt.want.writer))
}

if !assert.Equal(t, got.directoryOutput, tt.want.directoryOutput) {
t.Errorf("NewDumpIngestor() = %v, want %v", got.directoryOutput, tt.want.directoryOutput)
}
})
}
}
Expand Down
113 changes: 113 additions & 0 deletions pkg/dump/result.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
package dump

import (
"fmt"
"path"
"regexp"
)

type DumpResult struct {
clusterName string
RunID string
isDir bool
extension string
}

const (
DumpResultClusterNameRegex = `([A-Za-z0-9\.\-_]+)`
DumpResultRunIDRegex = `([a-z0-9]{26})`
DumpResultExtensionRegex = `\.?([a-z0-9\.]+)?`
DumpResultPrefix = "kubehound_"
DumpResultFilenameRegex = DumpResultPrefix + DumpResultClusterNameRegex + "_" + DumpResultRunIDRegex + DumpResultExtensionRegex
DumpResultPathRegex = DumpResultClusterNameRegex + "/" + DumpResultFilenameRegex

DumpResultTarWriterExtension = "tar.gz"
)

func NewDumpResult(clusterName, runID string, isCompressed bool) (*DumpResult, error) {
dumpResult := &DumpResult{
clusterName: clusterName,
RunID: runID,
isDir: true,
}
if isCompressed {
dumpResult.Compressed()
}

err := dumpResult.Validate()
if err != nil {
return nil, err
}

return dumpResult, nil
}

func (i *DumpResult) Validate() error {
re := regexp.MustCompile(DumpResultClusterNameRegex)
if !re.MatchString(i.clusterName) {
return fmt.Errorf("Invalid clustername: %q", i.clusterName)
}

matches := re.FindStringSubmatch(i.clusterName)
if len(matches) == 2 && matches[1] != i.clusterName {
return fmt.Errorf("Invalid clustername: %q", i.clusterName)
}

re = regexp.MustCompile(DumpResultRunIDRegex)
if !re.MatchString(i.RunID) {
return fmt.Errorf("Invalid runID: %q", i.RunID)
}

return nil
}

func (i *DumpResult) Compressed() {
i.isDir = false
i.extension = DumpResultTarWriterExtension
}

// ./<clusterName>/kubehound_<clusterName>_<run_id>
func (i *DumpResult) GetFullPath() string {
filename := i.GetFilename()

return path.Join(i.clusterName, filename)
}

func (i *DumpResult) GetFilename() string {
filename := fmt.Sprintf("%s%s_%s", DumpResultPrefix, i.clusterName, i.RunID)
if i.isDir {
return filename
}

return fmt.Sprintf("%s.%s", filename, i.extension)
}

func ParsePath(path string) (*DumpResult, error) {
// ./<clusterName>/kubehound_<clusterName>_<run_id>[.tar.gz]
// re := regexp.MustCompile(`([a-z0-9\.\-_]+)/kubehound_([a-z0-9\.-_]+)_([a-z0-9]{26})\.?([a-z0-9\.]+)?`)
re := regexp.MustCompile(DumpResultPathRegex)
if !re.MatchString(path) {
return nil, fmt.Errorf("Invalid path provided: %q", path)
}

matches := re.FindStringSubmatch(path)
// The cluster name should match (parent dir and in the filename)
if matches[1] != matches[2] {
return nil, fmt.Errorf("Cluster name does not match in the path provided: %q", path)
}

clusterName := matches[1]
runID := matches[3]
extension := matches[4]

isCompressed := false
if extension != "" {
isCompressed = true
}
result, err := NewDumpResult(clusterName, runID, isCompressed)
if err != nil {
return nil, err
}

return result, nil
}
Loading
Loading