Add a `fides init` command #313

ThomasLaPiana · 2022-01-12T17:18:50Z

Closes #276
Closes #315

Code Changes

break up the CLI commands
add an init command that creates a .fidesctl directory
update where the config can be found (.fidesctl/config.toml)
add tests
update docs with the new flow
separate integration tests from external integration tests (non docker)
default manifest_dir to be .fides, so fidesctl apply etc. are valid without providing an explicit directory

Steps to Confirm

try running fidesctl init locally

Pre-Merge Checklist

All CI Pipelines Succeeded
Documentation Updated

Description Of Changes

This PR adds the fides init command as a way to do the following:

Adds some structure and reduces cognitive load for the user
Gives us a clean entry point to introduce users to other commands/concepts

Additionally, the structure of the CLI has been reworked so that it isn't just one file with every command, but instead have been broken out into three distinct files.

This PR also does some more validation of user input where possible

… be skipped more easily

fidesctl/src/fidesctl/cli/commands/util_comands.py

fidesctl/src/fidesctl/core/config/__init__.py

ThomasLaPiana · 2022-01-13T20:07:34Z

@PSalant726 I'm looking at the places we have user input now and I'm trying to determine where/how we should sanitize. Most of the click commands have automatic type checks (manifest dir is checked to be a valid path) but for the strings, nothing is getting executed anywhere (fides_key gets used to make API calls, but never touches something like a db)

This is pretty out of my depth so I'm looking for some guidance here on how to shore things up

PSalant726

@ThomasLaPiana The inspiration behind #315 was database connection strings and other arbitrary string input, for example:

fides/fidesctl/src/fidesctl/cli/cli.py

Lines 127 to 128 in 287455a

    
           @click.argument("connection_string", type=str) 
        
           @click.argument("output_filename", type=str)

Here I would expect that the connection_string argument is validated by ensuring 1) that it's a valid URL-like string, and 2) that it can actually be used to create a DB connection. The output_filename argument should also be validated to ensure it's a valid file descriptor (doesn't include spaces, etc). I'm seeing similar treatment in the scan and annotate_dataset handlers.

There are a few other places where we accept arbitrary strings as command options and then pass them immediately to function calls. It's not clear to me if this is a potential attack vector, or if click is going to automatically prevent something scary (like arbitrary code execution attacks). If we want to be extra careful, we could define a custom type that we accept instead of simply str, which first matches against a simple regex like ^[A-z0-9].$, but this might be overly cautious.

Also, FYI, it looks like the config_path argument can be removed from the ping handler.

fidesctl/src/fidesctl/cli/commands/crud_commands.py

fidesctl/src/fidesctl/cli/commands/util_comands.py

fidesctl/src/fidesctl/core/config/__init__.py

…r own dir

ThomasLaPiana · 2022-01-14T18:08:21Z

@PSalant726 the connection string is passed to SQLAlchemy, which will throw errors immediately if it is not correctly formed. After that, a test select 1 query is run to make sure that a connection can be established. I don't see any vulnerabilities here.

As for arbitrary code execution, I'm also not sure where that would/could happen. I checked around online for user input sanitization in click/python which doesn't really seem to exist, which leads me to believe its not really an issue here.

As for checking output filenames, I don't really think thats on us. As an example, spaces are completely fine in Windows filenames, so we'd then be going down a path of platform-specific checking. I think its up to the user to provide a valid filename, otherwise they'll get an error when the code breaks.

ThomasLaPiana · 2022-01-14T18:13:30Z

@PSalant726 Almost all of the tools that I've used personally auto-generate a full config file for you and then let you decide what you want to edit/delete etc. I don't see the user having to put in key/value arguments to the command as a better user experience than editing a toml file.

I also don't think suggesting any kind of file structure is needed either. I prefer to let users organize their files as they see fit

For now it creates the .fides dir and dumps out a config file, but in the future I'd like to greatly expand it and see this as a solid starting point

PSalant726 · 2022-01-14T20:41:52Z

@ThomasLaPiana

the connection string is passed to SQLAlchemy, which will throw errors immediately if it is not correctly formed. After that, a test select 1 query is run to make sure that a connection can be established.

I had no idea this happened automatically - that's fantastic!

As for arbitrary code execution, I'm also not sure where that would/could happen. I checked around online for user input sanitization in click/python which doesn't really seem to exist, which leads me to believe its not really an issue here.

I appreciate you digging into this! For me it's the difference between a potential attack vector and an actual vulnerability. It might be that it would only represent an issue if we took the input string and passed it right to eval(), which would obviously be bad practice, but it also might be something totally unforeseen. We don't need to over-engineer things just to be safe, but at the same time I'm not content to say "input sanitization doesn't exist in click/python, therefore it's not an issue". I think if we can do anything [relatively simple] to narrow the scope of accepted input from just "any str", then we should do that. Unfortunately I can't think of any specific ideas at this point.

As an example, spaces are completely fine in Windows filenames, so we'd then be going down a path of platform-specific checking. I think its up to the user to provide a valid filename, otherwise they'll get an error when the code breaks.

Yea, after doing some more research it seems like this is a known unsolved problem in Python. Lame. Do you think whatever error might get raised would be helpful, or cryptic? It might just be that we can't do much to help here 😕

I don't see the user having to put in key/value arguments to the command as a better user experience than editing a toml file.

Good point. So maybe we only include the properties that are absolutely required, and point users to the documentation as part of the output (like you're already doing)? I think it eliminates the question of what to exclude because it wouldn't be some arbitrary subset of config options, it would only be the optional ones.

I also don't think suggesting any kind of file structure is needed either. I prefer to let users organize their files as they see fit

For me, the cost/benefit analysis is: what do users gain from the flexibility vs. what do we as maintainers lose by allowing it? If we're going to try and get all fides tools onto the same standard of config/manifest file management, then enforcing some more structure feels very valuable. I'm not sure what users gain from the flexibility, and it might lead them into messy manifest management practices. Thoughts?

in the future I'd like to greatly expand it

Do you have any ideas about what else might be included? Is there anything worth adding now?

ThomasLaPiana · 2022-01-14T23:42:33Z

@PSalant726

I appreciate you digging into this! For me it's the difference between a potential attack vector and an actual vulnerability. It might be that it would only represent an issue if we took the input string and passed it right to eval(), which would obviously be bad practice, but it also might be something totally unforeseen. We don't need to over-engineer things just to be safe, but at the same time I'm not content to say "input sanitization doesn't exist in click/python, therefore it's not an issue". I think if we can do anything [relatively simple] to narrow the scope of accepted input from just "any str", then we should do that. Unfortunately I can't think of any specific ideas at this point.

I dug into Click a little more and found some good options for validating user input, such as this, that being said, the hard part is now deciding what to clean. Is there a general best practice for sanitization?

Yea, after doing some more research it seems like this is a known unsolved problem in Python. Lame. Do you think whatever error might get raised would be helpful, or cryptic? It might just be that we can't do much to help here 😕

It looks like the click.Path() type we're already using implements this, so we should be good to go here. I can also add the exists flag for manifest_dir and it will check the directory actually exists before executing it. I'll also update output_dir with the new type validation

For me, the cost/benefit analysis is: what do users gain from the flexibility vs. what do we as maintainers lose by allowing it? If we're going to try and get all fides tools onto the same standard of config/manifest file management, then enforcing some more structure feels very valuable. I'm not sure what users gain from the flexibility, and it might lead them into messy manifest management practices. Thoughts?

I'm still of the opinion here that it isn't up to us to determine how a user should structure their stuff. When you give fidesctl a manifest file, it navigates the entire directory tree and attempts to load every yml file it finds. This means the current supported structure is completely arbitrary. Additionally, this will return a normalized Taxonomy object. So if fidesops or fidescls wants to ingest a set of manifest files, they can use that code from the fideslang.manifests directory and get back a standard Taxonomy object to work with. I believe fidesops is already doing this

Do you have any ideas about what else might be included? Is there anything worth adding now?

For sure, you yourself gave really good examples in the issue. Giving people the option to jump right into generating manifests seems like a logical next step once we support policy/system generation as well as datasets.

We could also point to more docs here.

Good point. So maybe we only include the properties that are absolutely required, and point users to the documentation as part of the output (like you're already doing)? I think it eliminates the question of what to exclude because it wouldn't be some arbitrary subset of config options, it would only be the optional ones.

Yeah I was on the fence between include/exclude, but landed on exclude so that anything new we added would be automatically included. I have preferred in the past to browse through a full config file just to see what we could configure, so here I just left off the things that we automatically set at config runtime.

PSalant726

Is there a general best practice for sanitization?

Sanitize everything you can? We don't want to create any pain points, but we also want to lock down as much as is reasonable. It's always a compromise.

It looks like the click.Path() type we're already using implements this, so we should be good to go here. I can also add the exists flag for manifest_dir and it will check the directory actually exists before executing it. I'll also update output_dir with the new type validation

Sweet! It's awesome that this is included - it makes things very clean.

When you give fidesctl a manifest file, it navigates the entire directory tree and attempts to load every yml file it finds. This means the current supported structure is completely arbitrary.

Part of the long-term goal of standardizing on a more structured .fides directory would be to eliminate some of this complexity when it isn't needed. If you're saying the current solution is good enough, we could always revisit this later if it becomes a problem.

Yeah I was on the fence between include/exclude, but landed on exclude so that anything new we added would be automatically included.

If this was the heart of your question (vs. the specifics of what should be included), then I completely missed it 🙈. In general I think it's safer to create an explicit allowlist, in case we add anything potentially unsafe in the future? Having been working on the auth changes, I'm thinking of some of the auth-related config options. Fidesops implements a get_censored_config function for this reason, but if we're only going to auto-populate generated fidesctl.toml files with dummy values, then maybe it's not a big deal?

fidesctl/src/fidesctl/cli/util_comands.py

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

PSalant726

Looking really good! Nothing major at this point. Separately though, I think it makes more sense to remove the analytics-related changes until we open a PR to add analytics/opt-out support. IMO the changes belong in that diff, and with that context. Also, we don't have the "approved" copy yet.

docs/fides/docs/installation/configuration.md

docs/fides/docs/quickstart/docker.md

fidesctl/src/fidesctl/cli/utils.py

fidesctl/.fides/fidesctl.toml

fidesctl/src/fidesctl/cli/util_comands.py

ThomasLaPiana · 2022-01-23T16:21:44Z

Looking really good! Nothing major at this point. Separately though, I think it makes more sense to remove the analytics-related changes until we open a PR to add analytics/opt-out support. IMO the changes belong in that diff, and with that context. Also, we don't have the "approved" copy yet.

this was a sloppy friday afternoon PR, sorry about that, thanks for catching all of the little issues!

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

…fides into ThomasLaPiana-fides-init

SteveDMurphy

Great work on another change with wide impacts @ThomasLaPiana ! Really like the upgrades around organization with the CLI as well 🙌🏽 the only things I added were some thoughts around docs and it looks like Phil has an open question or two as well

docs/fides/docs/installation/configuration.md

docs/fides/docs/tutorial/add.md

docs/fides/docs/quickstart/local_standalone.md

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

Add a fides init command

dab1041

ThomasLaPiana self-assigned this Jan 12, 2022

Thomas La Piana added 4 commits January 12, 2022 11:40

reorganize the cli code

33fbf88

add an init command outline

2d60b85

don't mark all db tests as integration to allow for external tests to…

37ec6dc

… be skipped more easily

finish building out fides init

61ec676

ThomasLaPiana commented Jan 13, 2022

View reviewed changes

fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved

ThomasLaPiana commented Jan 13, 2022

View reviewed changes

fidesctl/src/fidesctl/core/config/__init__.py Outdated Show resolved Hide resolved

PSalant726 suggested changes Jan 13, 2022

View reviewed changes

Thomas La Piana added 2 commits January 14, 2022 08:01

move the commands back into the CLI dir, i don't think they need thei…

5f8e9f2

…r own dir

update based on review comments

df89443

fixed unused import errors

cbb40e9

Thomas La Piana added 2 commits January 14, 2022 17:43

Merge branch 'main' into ThomasLaPiana-fides-init

3e7be25

small input validation tweaks

93e0b38

SteveDMurphy mentioned this pull request Jan 18, 2022

Export System & Dataset as csv #317

Merged

5 tasks

ThomasLaPiana requested a review from PSalant726 January 18, 2022 22:07

PSalant726 suggested changes Jan 18, 2022

View reviewed changes

fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved

fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved

fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved

ThomasLaPiana and others added 3 commits January 19, 2022 09:57

Apply suggestions from code review

661cecc

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

cleanup, add directory argument

c10dd4d

docs update, add the analytics flag to the user settings

be58d42

ThomasLaPiana marked this pull request as ready for review January 21, 2022 20:44

Thomas La Piana added 2 commits January 21, 2022 14:53

small docs changes

66f0c1e

Merge branch 'main' into ThomasLaPiana-fides-init

0be4567

ThomasLaPiana requested review from earmenda and iamkelllly January 21, 2022 22:40

ThomasLaPiana requested review from PSalant726 and SteveDMurphy January 21, 2022 22:40

fix the fidesctl.toml

49d4287

PSalant726 suggested changes Jan 21, 2022

View reviewed changes

Thomas La Piana and others added 3 commits January 23, 2022 10:28

make some updates from the review

ca45059

Apply suggestions from code review

cd67402

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

let users choose the dir in which .fides is created

b667287

ThomasLaPiana requested a review from PSalant726 January 23, 2022 16:38

ThomasLaPiana and others added 8 commits January 23, 2022 10:46

Update docs/fides/docs/quickstart/docker.md

f8c986f

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

Update docs/fides/docs/installation/configuration.md

6438b50

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

add an external test step

a83a9d2

Merge branch 'ThomasLaPiana-fides-init' of https://github.com/ethyca/…

b879938

…fides into ThomasLaPiana-fides-init

fix the env vars for the docker-compose file

07bdb34

Merge branch 'main' into ThomasLaPiana-fides-init

acd9db5

update the external pytest

e0c781b

revert a bad change to the makefile

f35c643

ThomasLaPiana mentioned this pull request Jan 24, 2022

Add default/example policies to the docs #327

Merged

6 tasks

SteveDMurphy approved these changes Jan 24, 2022

View reviewed changes

docs/fides/docs/installation/configuration.md Show resolved Hide resolved

docs/fides/docs/tutorial/add.md Show resolved Hide resolved

docs/fides/docs/quickstart/local_standalone.md Show resolved Hide resolved

Update docs/fides/docs/quickstart/local_standalone.md

d6c1ce6

Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>

ThomasLaPiana merged commit 9777ae7 into main Jan 25, 2022

ThomasLaPiana deleted the ThomasLaPiana-fides-init branch January 25, 2022 16:51

iamkelllly added this to the fidesctl 1.3.0 milestone Feb 10, 2022

iamkelllly mentioned this pull request Mar 24, 2022

Update Quickstart documentation in Step 1 removing fidesctl db init #414

Closed

ThomasLaPiana pushed a commit that referenced this pull request Aug 17, 2022

#110 - change oauth endpoint description (#313)

388e79e

ThomasLaPiana pushed a commit that referenced this pull request Sep 26, 2022

#110 - change oauth endpoint description (#313)

0678e84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `fides init` command #313

Add a `fides init` command #313

ThomasLaPiana commented Jan 12, 2022 •

edited

Loading

ThomasLaPiana commented Jan 13, 2022

PSalant726 left a comment

ThomasLaPiana commented Jan 14, 2022

ThomasLaPiana commented Jan 14, 2022

PSalant726 commented Jan 14, 2022

ThomasLaPiana commented Jan 14, 2022

PSalant726 left a comment

PSalant726 left a comment

ThomasLaPiana commented Jan 23, 2022

SteveDMurphy left a comment

	@click.argument("connection_string", type=str)
	@click.argument("output_filename", type=str)

Add a fides init command #313

Add a fides init command #313

Conversation

ThomasLaPiana commented Jan 12, 2022 • edited Loading

Code Changes

Steps to Confirm

Pre-Merge Checklist

Description Of Changes

ThomasLaPiana commented Jan 13, 2022

PSalant726 left a comment

Choose a reason for hiding this comment

ThomasLaPiana commented Jan 14, 2022

ThomasLaPiana commented Jan 14, 2022

PSalant726 commented Jan 14, 2022

ThomasLaPiana commented Jan 14, 2022

PSalant726 left a comment

Choose a reason for hiding this comment

PSalant726 left a comment

Choose a reason for hiding this comment

ThomasLaPiana commented Jan 23, 2022

SteveDMurphy left a comment

Choose a reason for hiding this comment

Add a `fides init` command #313

Add a `fides init` command #313

ThomasLaPiana commented Jan 12, 2022 •

edited

Loading