-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING(flags) feat(bulk): Allow encrypted input with unencrypted output in bulk. #6541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain, @parasssh, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):
Quoted 4 lines of code…
if len(opt.EncryptionKey) > 0 && !opt.Encrypted && !opt.EncryptedOut { fmt.Printf("Must use --encrypted/--encrypted_out with --encryption_key_file option.\n") os.Exit(1) }
This error message is confusing especially if I set both of these but they're set to false:
$ dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file ~/dg/enc_key_file --encrypted=false --encrypted_out=false
...
Must use --encrypted/--encrypted_out with --encryption_key_file option.
There's a way in the Cobra flag handling to check if a flag was set/changed by the user (I believe it's this function from Cobra/Viper/pflag: https://godoc.org/github.com/spf13/pflag#FlagSet.Changed). Maybe that'd be useful.
There's a combination of flag values we need to support between these three flags. Can we make the flags work like this instead?
Input: Encrypted, Output: Encrypted by only specifying --encryption_key_file
(no need to specify encrypted=true):
dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key
Input: Encrypted, Output: Unencrypted:
dgraph bulk -f data.rdf.gz -s data.schema.gz --encrypted_key_file ./enc-key --encrypted_out=false
Input: Unencrypted, Output: Encrypted
dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false
Input: Unencrypted, Output: Unencrypted (no flags to set, or both set to false, in which case encryption_key_file is effectively ignored):
dgraph bulk -f data.rdf.gz -s data.schema.gz
dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false --encrypted=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, @martinmr, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):
Previously, danielmai (Daniel Mai) wrote…
if len(opt.EncryptionKey) > 0 && !opt.Encrypted && !opt.EncryptedOut { fmt.Printf("Must use --encrypted/--encrypted_out with --encryption_key_file option.\n") os.Exit(1) }
This error message is confusing especially if I set both of these but they're set to false:
$ dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file ~/dg/enc_key_file --encrypted=false --encrypted_out=false ... Must use --encrypted/--encrypted_out with --encryption_key_file option.
There's a way in the Cobra flag handling to check if a flag was set/changed by the user (I believe it's this function from Cobra/Viper/pflag: https://godoc.org/github.com/spf13/pflag#FlagSet.Changed). Maybe that'd be useful.
There's a combination of flag values we need to support between these three flags. Can we make the flags work like this instead?
Input: Encrypted, Output: Encrypted by only specifying
--encryption_key_file
(no need to specify encrypted=true):dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key
Input: Encrypted, Output: Unencrypted:
dgraph bulk -f data.rdf.gz -s data.schema.gz --encrypted_key_file ./enc-key --encrypted_out=false
Input: Unencrypted, Output: Encrypted
dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false
Input: Unencrypted, Output: Unencrypted (no flags to set, or both set to false, in which case encryption_key_file is effectively ignored):
dgraph bulk -f data.rdf.gz -s data.schema.gz
dgraph bulk -f data.rdf.gz -s data.schema.gz --encryption_key_file ./enc-key --encrypted=false --encrypted=false
The last command was in conflict with first one. Here is what I have done:
dgraph bulk -f g01.rdf.gz -s g01.schema.gz
=> Unenc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key
=> Enc Input to Enc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted
=> Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out
=> Unenc Input to Enc Output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, @martinmr, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):
Previously, parasssh wrote…
The last command was in conflict with first one. Here is what I have done:
dgraph bulk -f g01.rdf.gz -s g01.schema.gz
=> Unenc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key
=> Enc Input to Enc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted
=> Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out
=> Unenc Input to Enc Output
About the flags, it will be moot after latest patch that implements above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @manishrjain, @martinmr, @parasssh, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted => Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out => Unenc Input to Enc Output
Did you mean --encrypted=false
for the first one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, @martinmr, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):
Previously, danielmai (Daniel Mai) wrote…
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted => Enc Input to Unenc Output
dgraph bulk -f g01.rdf.gz -s g01.schema.gz --encryption_key_file=./enc_key --encrypted_out => Unenc Input to Enc OutputDid you mean
--encrypted=false
for the first one?
No. For the first one, input in encrypted and hence --encrypted must be specified which indicate input encrypted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run --encrypted_out=false
it still outputs encrypted data.
Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 5 unresolved discussions (waiting on @danielmai, @manishrjain, @parasssh, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 67 at r3 (raw file):
Quoted 6 lines of code…
flag.Bool("encrypted", false, "Flag to indicate whether schema and data files are encrypted." + "Must be specified with --encryption_key_file or vault options") flag.Bool("encrypted_out", false, "Flag to indicate whether to encrypt the output." + "Must be specified with --encryption_key_file or vault option.")
Add spaces between the sentences. The help text is missing those:
$ dgraph bulk -h ... --encrypted Flag to indicate whether schema and data files are encrypted.Must be specified with --encryption_key_file or vault options --encrypted_out Flag to indicate whether to encrypt the output.Must be specified with --encryption_key_file or vault option.
dgraph/cmd/bulk/run.go, line 175 at r3 (raw file):
len(opt.EncryptionKey) == 0
Given the flag help text and error message talk about Vault flags, does this check work if Vault is specified?
dgraph/cmd/bulk/run.go, line 188 at r3 (raw file):
Must set --encrypted and/or --encrypted_out to true.
can we add more detail here? e.g.,
Must set --encrypted and/or --encrypted_out to true when providing encryption key
dgraph/cmd/bulk/run.go, line 193 at r3 (raw file):
encrytped
encrypted (typo)
Actually, this will be clearer if these were statements and not questions:
Encrypted input: %v. Encrypted output: %v.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 6 unresolved discussions (waiting on @danielmai, @manishrjain, @parasssh, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 186 at r4 (raw file):
if !opt.Encrypted || !opt.EncryptedOut
I tried this:
dgraph bulk -f data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.rdf.gz -s data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.schema.gz --badger.compression_level=0 --encryption_key_file ../ee/enc/test-fixtures/enc-key --encrypted=true --encrypted_out=false
and it didn't work
Must set --encrypted and/or --encrypted_out to true.
Should this condition be && instead of ||? If both flags are set to false, then it's an error since there's no encryption involved in either the input or output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 6 unresolved discussions (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 173 at r1 (raw file):
Previously, parasssh wrote…
No. For the first one, input in encrypted and hence --encrypted must be specified which indicate input encrypted.
Done.
dgraph/cmd/bulk/run.go, line 67 at r3 (raw file):
Previously, danielmai (Daniel Mai) wrote…
flag.Bool("encrypted", false, "Flag to indicate whether schema and data files are encrypted." + "Must be specified with --encryption_key_file or vault options") flag.Bool("encrypted_out", false, "Flag to indicate whether to encrypt the output." + "Must be specified with --encryption_key_file or vault option.")
Add spaces between the sentences. The help text is missing those:
$ dgraph bulk -h ... --encrypted Flag to indicate whether schema and data files are encrypted.Must be specified with --encryption_key_file or vault options --encrypted_out Flag to indicate whether to encrypt the output.Must be specified with --encryption_key_file or vault option.
Done.
dgraph/cmd/bulk/run.go, line 175 at r3 (raw file):
Previously, danielmai (Daniel Mai) wrote…
len(opt.EncryptionKey) == 0
Given the flag help text and error message talk about Vault flags, does this check work if Vault is specified?
Yes. Basically this field will be populated with key bytes from file or vault. So, logic will remain same once key is populated.
dgraph/cmd/bulk/run.go, line 188 at r3 (raw file):
Previously, danielmai (Daniel Mai) wrote…
Must set --encrypted and/or --encrypted_out to true.
can we add more detail here? e.g.,
Must set --encrypted and/or --encrypted_out to true when providing encryption key
Done.
dgraph/cmd/bulk/run.go, line 193 at r3 (raw file):
Previously, danielmai (Daniel Mai) wrote…
encrytped
encrypted (typo)
Actually, this will be clearer if these were statements and not questions:
Encrypted input: %v. Encrypted output: %v.
Done.
dgraph/cmd/bulk/run.go, line 186 at r4 (raw file):
Previously, danielmai (Daniel Mai) wrote…
if !opt.Encrypted || !opt.EncryptedOut
I tried this:
dgraph bulk -f data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.rdf.gz -s data-enc/alpha1/export/dgraph.r33.u0924.2105/g01.schema.gz --badger.compression_level=0 --encryption_key_file ../ee/enc/test-fixtures/enc-key --encrypted=true --encrypted_out=false
and it didn't work
Must set --encrypted and/or --encrypted_out to true.
Should this condition be && instead of ||? If both flags are set to false, then it's an error since there's no encryption involved in either the input or output.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: 2 of 3 files reviewed, all discussions resolved (waiting on @danielmai, @manishrjain, and @vvbalaji-dgraph)
dgraph/cmd/bulk/run.go, line 175 at r3 (raw file):
Previously, parasssh wrote…
Yes. Basically this field will be populated with key bytes from file or vault. So, logic will remain same once key is populated.
Got it. Great.
Fixes DGRAPH-2135
All UT pass.
This adds a new flag
--encrypted_out
to bulk loader. Now, when bulk loading with encryption input or output, you must set both the--encrypted
and--encrypted_out
flags along with the encryption key. Otherwise, it's an error.For example, to bulk load encrypted input data and output unencrypted data:
This change is