Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING(flags) feat(bulk): Allow encrypted input with unencrypted output in bulk. #6541

Merged
merged 6 commits into from
Sep 25, 2020

Conversation

parasssh
Copy link
Contributor

@parasssh parasssh commented Sep 21, 2020

Fixes DGRAPH-2135

All UT pass.

This adds a new flag --encrypted_out to bulk loader. Now, when bulk loading with encryption input or output, you must set both the --encrypted and --encrypted_out flags along with the encryption key. Otherwise, it's an error.

For example, to bulk load encrypted input data and output unencrypted data:

dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file ./enc-key --encrypted=true --encrypted_out=false

This change is Reviewable

@github-actions github-actions bot added the area/bulk-loader Issues related to bulk loading. label Sep 21, 2020
Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain, @parasssh, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):

Quoted 4 lines of code…
if len(opt.EncryptionKey) > 0 && !opt.Encrypted && !opt.EncryptedOut {
		fmt.Printf("Must use --encrypted/--encrypted_out with --encryption_key_file option.\n")
		os.Exit(1)		
	}

This error message is confusing especially if I set both of these but they're set to false:

$ dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file ~/dg/enc_key_file --encrypted=false --encrypted_out=false
...
Must use --encrypted/--encrypted_out with --encryption_key_file option.

There's a way in the Cobra flag handling to check if a flag was set/changed by the user (I believe it's this function from Cobra/Viper/pflag: https://godoc.org/github.com/spf13/pflag#FlagSet.Changed). Maybe that'd be useful.

There's a combination of flag values we need to support between these three flags. Can we make the flags work like this instead?

Input: Encrypted, Output: Encrypted by only specifying --encryption_key_file (no need to specify encrypted=true):

dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key

Input: Encrypted, Output: Unencrypted:

dgraph bulk -f data.rdf.gz -s data.schema.gz --encrypted_key_file ./enc-key --encrypted_out=false

Input: Unencrypted, Output: Encrypted

dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false

Input: Unencrypted, Output: Unencrypted (no flags to set, or both set to false, in which case encryption_key_file is effectively ignored):

dgraph bulk -f data.rdf.gz -s data.schema.gz
dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false --encrypted=false

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, @martinmr, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):

Previously, danielmai (Daniel Mai) wrote…
if len(opt.EncryptionKey) > 0 && !opt.Encrypted && !opt.EncryptedOut {
		fmt.Printf("Must use --encrypted/--encrypted_out with --encryption_key_file option.\n")
		os.Exit(1)		
	}

This error message is confusing especially if I set both of these but they're set to false:

$ dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file ~/dg/enc_key_file --encrypted=false --encrypted_out=false
...
Must use --encrypted/--encrypted_out with --encryption_key_file option.

There's a way in the Cobra flag handling to check if a flag was set/changed by the user (I believe it's this function from Cobra/Viper/pflag: https://godoc.org/github.com/spf13/pflag#FlagSet.Changed). Maybe that'd be useful.

There's a combination of flag values we need to support between these three flags. Can we make the flags work like this instead?

Input: Encrypted, Output: Encrypted by only specifying --encryption_key_file (no need to specify encrypted=true):

dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key

Input: Encrypted, Output: Unencrypted:

dgraph bulk -f data.rdf.gz -s data.schema.gz --encrypted_key_file ./enc-key --encrypted_out=false

Input: Unencrypted, Output: Encrypted

dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false

Input: Unencrypted, Output: Unencrypted (no flags to set, or both set to false, in which case encryption_key_file is effectively ignored):

dgraph bulk -f data.rdf.gz -s data.schema.gz
dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false --encrypted=false

The last command was in conflict with first one. Here is what I have done:

dgraph bulk -f g01.rdf.gz -s g01.schema.gz => Unenc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key => Enc Input to Enc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted => Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out => Unenc Input to Enc Output

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, @martinmr, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):

Previously, parasssh wrote…

The last command was in conflict with first one. Here is what I have done:

dgraph bulk -f g01.rdf.gz -s g01.schema.gz => Unenc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key => Enc Input to Enc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted => Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out => Unenc Input to Enc Output

About the flags, it will be moot after latest patch that implements above.

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @manishrjain, @martinmr, @parasssh, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):

dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted => Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out => Unenc Input to Enc Output

Did you mean --encrypted=false for the first one?

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, @martinmr, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):

Previously, danielmai (Daniel Mai) wrote…

dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted => Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out => Unenc Input to Enc Output

Did you mean --encrypted=false for the first one?

No. For the first one, input in encrypted and hence --encrypted must be specified which indicate input encrypted.

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run --encrypted_out=false it still outputs encrypted data.

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 5 unresolved discussions (waiting on @danielmai, @manishrjain, @parasssh, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 67 at r3 (raw file):

Quoted 6 lines of code…
flag.Bool("encrypted", false,
		"Flag to indicate whether schema and data files are encrypted." +
		"Must be specified with --encryption_key_file or vault options")
	flag.Bool("encrypted_out", false,
		"Flag to indicate whether to encrypt the output." +
		"Must be specified with --encryption_key_file or vault option.")

Add spaces between the sentences. The help text is missing those:

$ dgraph bulk -h
...
      --encrypted                        Flag to indicate whether schema and data files are encrypted.Must be specified with --encryption_key_file or vault options
      --encrypted_out                    Flag to indicate whether to encrypt the output.Must be specified with --encryption_key_file or vault option.

dgraph/cmd/bulk/run.go, line 175 at r3 (raw file):

len(opt.EncryptionKey) == 0

Given the flag help text and error message talk about Vault flags, does this check work if Vault is specified?


dgraph/cmd/bulk/run.go, line 188 at r3 (raw file):

Must set --encrypted and/or --encrypted_out to true.

can we add more detail here? e.g.,

Must set --encrypted and/or --encrypted_out to true when providing encryption key


dgraph/cmd/bulk/run.go, line 193 at r3 (raw file):

encrytped

encrypted (typo)

Actually, this will be clearer if these were statements and not questions:

Encrypted input: %v. Encrypted output: %v.

@danielmai danielmai changed the title Allow all encrpted/unencrpted input/output combinations in bulk feat(bulk): Allow encrypted input with unencrypted output in bulk. Sep 24, 2020
Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 6 unresolved discussions (waiting on @danielmai, @manishrjain, @parasssh, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 186 at r4 (raw file):

if !opt.Encrypted || !opt.EncryptedOut

I tried this:

dgraph bulk -f data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.rdf.gz -s data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.schema.gz --badger.compression_level=0 --encryption_key_file ../ee/enc/test-fixtures/enc-key --encrypted=true --encrypted_out=false

and it didn't work

Must set --encrypted and/or --encrypted_out to true.

Should this condition be && instead of ||? If both flags are set to false, then it's an error since there's no encryption involved in either the input or output.

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 6 unresolved discussions (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):

Previously, parasssh wrote…

No. For the first one, input in encrypted and hence --encrypted must be specified which indicate input encrypted.

Done.


dgraph/cmd/bulk/run.go, line 67 at r3 (raw file):

Previously, danielmai (Daniel Mai) wrote…
flag.Bool("encrypted", false,
		"Flag to indicate whether schema and data files are encrypted." +
		"Must be specified with --encryption_key_file or vault options")
	flag.Bool("encrypted_out", false,
		"Flag to indicate whether to encrypt the output." +
		"Must be specified with --encryption_key_file or vault option.")

Add spaces between the sentences. The help text is missing those:

$ dgraph bulk -h
...
      --encrypted                        Flag to indicate whether schema and data files are encrypted.Must be specified with --encryption_key_file or vault options
      --encrypted_out                    Flag to indicate whether to encrypt the output.Must be specified with --encryption_key_file or vault option.

Done.


dgraph/cmd/bulk/run.go, line 175 at r3 (raw file):

Previously, danielmai (Daniel Mai) wrote…
len(opt.EncryptionKey) == 0

Given the flag help text and error message talk about Vault flags, does this check work if Vault is specified?

Yes. Basically this field will be populated with key bytes from file or vault. So, logic will remain same once key is populated.


dgraph/cmd/bulk/run.go, line 188 at r3 (raw file):

Previously, danielmai (Daniel Mai) wrote…
Must set --encrypted and/or --encrypted_out to true.

can we add more detail here? e.g.,

Must set --encrypted and/or --encrypted_out to true when providing encryption key

Done.


dgraph/cmd/bulk/run.go, line 193 at r3 (raw file):

Previously, danielmai (Daniel Mai) wrote…
encrytped

encrypted (typo)

Actually, this will be clearer if these were statements and not questions:

Encrypted input: %v. Encrypted output: %v.

Done.


dgraph/cmd/bulk/run.go, line 186 at r4 (raw file):

Previously, danielmai (Daniel Mai) wrote…
if !opt.Encrypted || !opt.EncryptedOut

I tried this:

dgraph bulk -f data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.rdf.gz -s data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.schema.gz --badger.compression_level=0 --encryption_key_file ../ee/enc/test-fixtures/enc-key --encrypted=true --encrypted_out=false

and it didn't work

Must set --encrypted and/or --encrypted_out to true.

Should this condition be && instead of ||? If both flags are set to false, then it's an error since there's no encryption involved in either the input or output.

Done.

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)


dgraph/cmd/bulk/run.go, line 175 at r3 (raw file):

Previously, parasssh wrote…

Yes. Basically this field will be populated with key bytes from file or vault. So, logic will remain same once key is populated.

Got it. Great.

@parasssh parasssh merged commit 11c5584 into master Sep 25, 2020
@danielmai danielmai changed the title feat(bulk): Allow encrypted input with unencrypted output in bulk. BREAKING(flags) feat(bulk): Allow encrypted input with unencrypted output in bulk. Nov 23, 2020
@joshua-goldstein joshua-goldstein deleted the paras/bulk_enc_out branch August 11, 2022 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bulk-loader Issues related to bulk loading.
Development

Successfully merging this pull request may close these issues.

3 participants