Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fides init command #313

Merged
merged 28 commits into from
Jan 25, 2022
Merged

Add a fides init command #313

merged 28 commits into from
Jan 25, 2022

Conversation

ThomasLaPiana
Copy link
Contributor

@ThomasLaPiana ThomasLaPiana commented Jan 12, 2022

Closes #276
Closes #315

Code Changes

  • break up the CLI commands
  • add an init command that creates a .fidesctl directory
  • update where the config can be found (.fidesctl/config.toml)
  • add tests
  • update docs with the new flow
  • separate integration tests from external integration tests (non docker)
  • default manifest_dir to be .fides, so fidesctl apply etc. are valid without providing an explicit directory

Steps to Confirm

  • try running fidesctl init locally

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation Updated

Description Of Changes

This PR adds the fides init command as a way to do the following:

  1. Adds some structure and reduces cognitive load for the user
  2. Gives us a clean entry point to introduce users to other commands/concepts

Additionally, the structure of the CLI has been reworked so that it isn't just one file with every command, but instead have been broken out into three distinct files.

This PR also does some more validation of user input where possible

@ThomasLaPiana ThomasLaPiana self-assigned this Jan 12, 2022
@ThomasLaPiana
Copy link
Contributor Author

@PSalant726 I'm looking at the places we have user input now and I'm trying to determine where/how we should sanitize. Most of the click commands have automatic type checks (manifest dir is checked to be a valid path) but for the strings, nothing is getting executed anywhere (fides_key gets used to make API calls, but never touches something like a db)

This is pretty out of my depth so I'm looking for some guidance here on how to shore things up

Copy link
Contributor

@PSalant726 PSalant726 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasLaPiana The inspiration behind #315 was database connection strings and other arbitrary string input, for example:

@click.argument("connection_string", type=str)
@click.argument("output_filename", type=str)

Here I would expect that the connection_string argument is validated by ensuring 1) that it's a valid URL-like string, and 2) that it can actually be used to create a DB connection. The output_filename argument should also be validated to ensure it's a valid file descriptor (doesn't include spaces, etc). I'm seeing similar treatment in the scan and annotate_dataset handlers.

There are a few other places where we accept arbitrary strings as command options and then pass them immediately to function calls. It's not clear to me if this is a potential attack vector, or if click is going to automatically prevent something scary (like arbitrary code execution attacks). If we want to be extra careful, we could define a custom type that we accept instead of simply str, which first matches against a simple regex like ^[A-z0-9].$, but this might be overly cautious.

Also, FYI, it looks like the config_path argument can be removed from the ping handler.

fidesctl/src/fidesctl/cli/commands/crud_commands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/commands/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/core/config/__init__.py Outdated Show resolved Hide resolved
@ThomasLaPiana
Copy link
Contributor Author

@PSalant726 the connection string is passed to SQLAlchemy, which will throw errors immediately if it is not correctly formed. After that, a test select 1 query is run to make sure that a connection can be established. I don't see any vulnerabilities here.

As for arbitrary code execution, I'm also not sure where that would/could happen. I checked around online for user input sanitization in click/python which doesn't really seem to exist, which leads me to believe its not really an issue here.

As for checking output filenames, I don't really think thats on us. As an example, spaces are completely fine in Windows filenames, so we'd then be going down a path of platform-specific checking. I think its up to the user to provide a valid filename, otherwise they'll get an error when the code breaks.

@ThomasLaPiana
Copy link
Contributor Author

@PSalant726 Almost all of the tools that I've used personally auto-generate a full config file for you and then let you decide what you want to edit/delete etc. I don't see the user having to put in key/value arguments to the command as a better user experience than editing a toml file.

I also don't think suggesting any kind of file structure is needed either. I prefer to let users organize their files as they see fit

For now it creates the .fides dir and dumps out a config file, but in the future I'd like to greatly expand it and see this as a solid starting point

@PSalant726
Copy link
Contributor

@ThomasLaPiana

the connection string is passed to SQLAlchemy, which will throw errors immediately if it is not correctly formed. After that, a test select 1 query is run to make sure that a connection can be established.

I had no idea this happened automatically - that's fantastic!

As for arbitrary code execution, I'm also not sure where that would/could happen. I checked around online for user input sanitization in click/python which doesn't really seem to exist, which leads me to believe its not really an issue here.

I appreciate you digging into this! For me it's the difference between a potential attack vector and an actual vulnerability. It might be that it would only represent an issue if we took the input string and passed it right to eval(), which would obviously be bad practice, but it also might be something totally unforeseen. We don't need to over-engineer things just to be safe, but at the same time I'm not content to say "input sanitization doesn't exist in click/python, therefore it's not an issue". I think if we can do anything [relatively simple] to narrow the scope of accepted input from just "any str", then we should do that. Unfortunately I can't think of any specific ideas at this point.

As an example, spaces are completely fine in Windows filenames, so we'd then be going down a path of platform-specific checking. I think its up to the user to provide a valid filename, otherwise they'll get an error when the code breaks.

Yea, after doing some more research it seems like this is a known unsolved problem in Python. Lame. Do you think whatever error might get raised would be helpful, or cryptic? It might just be that we can't do much to help here 😕

I don't see the user having to put in key/value arguments to the command as a better user experience than editing a toml file.

Good point. So maybe we only include the properties that are absolutely required, and point users to the documentation as part of the output (like you're already doing)? I think it eliminates the question of what to exclude because it wouldn't be some arbitrary subset of config options, it would only be the optional ones.

I also don't think suggesting any kind of file structure is needed either. I prefer to let users organize their files as they see fit

For me, the cost/benefit analysis is: what do users gain from the flexibility vs. what do we as maintainers lose by allowing it? If we're going to try and get all fides tools onto the same standard of config/manifest file management, then enforcing some more structure feels very valuable. I'm not sure what users gain from the flexibility, and it might lead them into messy manifest management practices. Thoughts?

in the future I'd like to greatly expand it

Do you have any ideas about what else might be included? Is there anything worth adding now?

@ThomasLaPiana
Copy link
Contributor Author

@PSalant726

I appreciate you digging into this! For me it's the difference between a potential attack vector and an actual vulnerability. It might be that it would only represent an issue if we took the input string and passed it right to eval(), which would obviously be bad practice, but it also might be something totally unforeseen. We don't need to over-engineer things just to be safe, but at the same time I'm not content to say "input sanitization doesn't exist in click/python, therefore it's not an issue". I think if we can do anything [relatively simple] to narrow the scope of accepted input from just "any str", then we should do that. Unfortunately I can't think of any specific ideas at this point.

I dug into Click a little more and found some good options for validating user input, such as this, that being said, the hard part is now deciding what to clean. Is there a general best practice for sanitization?

Yea, after doing some more research it seems like this is a known unsolved problem in Python. Lame. Do you think whatever error might get raised would be helpful, or cryptic? It might just be that we can't do much to help here 😕

It looks like the click.Path() type we're already using implements this, so we should be good to go here. I can also add the exists flag for manifest_dir and it will check the directory actually exists before executing it. I'll also update output_dir with the new type validation

For me, the cost/benefit analysis is: what do users gain from the flexibility vs. what do we as maintainers lose by allowing it? If we're going to try and get all fides tools onto the same standard of config/manifest file management, then enforcing some more structure feels very valuable. I'm not sure what users gain from the flexibility, and it might lead them into messy manifest management practices. Thoughts?

I'm still of the opinion here that it isn't up to us to determine how a user should structure their stuff. When you give fidesctl a manifest file, it navigates the entire directory tree and attempts to load every yml file it finds. This means the current supported structure is completely arbitrary. Additionally, this will return a normalized Taxonomy object. So if fidesops or fidescls wants to ingest a set of manifest files, they can use that code from the fideslang.manifests directory and get back a standard Taxonomy object to work with. I believe fidesops is already doing this

Do you have any ideas about what else might be included? Is there anything worth adding now?

For sure, you yourself gave really good examples in the issue. Giving people the option to jump right into generating manifests seems like a logical next step once we support policy/system generation as well as datasets.

We could also point to more docs here.

Good point. So maybe we only include the properties that are absolutely required, and point users to the documentation as part of the output (like you're already doing)? I think it eliminates the question of what to exclude because it wouldn't be some arbitrary subset of config options, it would only be the optional ones.

Yeah I was on the fence between include/exclude, but landed on exclude so that anything new we added would be automatically included. I have preferred in the past to browse through a full config file just to see what we could configure, so here I just left off the things that we automatically set at config runtime.

Copy link
Contributor

@PSalant726 PSalant726 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a general best practice for sanitization?

Sanitize everything you can? We don't want to create any pain points, but we also want to lock down as much as is reasonable. It's always a compromise.

It looks like the click.Path() type we're already using implements this, so we should be good to go here. I can also add the exists flag for manifest_dir and it will check the directory actually exists before executing it. I'll also update output_dir with the new type validation

Sweet! It's awesome that this is included - it makes things very clean.

When you give fidesctl a manifest file, it navigates the entire directory tree and attempts to load every yml file it finds. This means the current supported structure is completely arbitrary.

Part of the long-term goal of standardizing on a more structured .fides directory would be to eliminate some of this complexity when it isn't needed. If you're saying the current solution is good enough, we could always revisit this later if it becomes a problem.

Yeah I was on the fence between include/exclude, but landed on exclude so that anything new we added would be automatically included.

If this was the heart of your question (vs. the specifics of what should be included), then I completely missed it 🙈. In general I think it's safer to create an explicit allowlist, in case we add anything potentially unsafe in the future? Having been working on the auth changes, I'm thinking of some of the auth-related config options. Fidesops implements a get_censored_config function for this reason, but if we're only going to auto-populate generated fidesctl.toml files with dummy values, then maybe it's not a big deal?

fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved
@ThomasLaPiana ThomasLaPiana marked this pull request as ready for review January 21, 2022 20:44
Copy link
Contributor

@PSalant726 PSalant726 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! Nothing major at this point. Separately though, I think it makes more sense to remove the analytics-related changes until we open a PR to add analytics/opt-out support. IMO the changes belong in that diff, and with that context. Also, we don't have the "approved" copy yet.

docs/fides/docs/installation/configuration.md Show resolved Hide resolved
docs/fides/docs/installation/configuration.md Show resolved Hide resolved
docs/fides/docs/installation/configuration.md Outdated Show resolved Hide resolved
docs/fides/docs/quickstart/docker.md Show resolved Hide resolved
docs/fides/docs/quickstart/docker.md Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/utils.py Outdated Show resolved Hide resolved
fidesctl/.fides/fidesctl.toml Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved
fidesctl/src/fidesctl/cli/util_comands.py Outdated Show resolved Hide resolved
@ThomasLaPiana
Copy link
Contributor Author

Looking really good! Nothing major at this point. Separately though, I think it makes more sense to remove the analytics-related changes until we open a PR to add analytics/opt-out support. IMO the changes belong in that diff, and with that context. Also, we don't have the "approved" copy yet.

this was a sloppy friday afternoon PR, sorry about that, thanks for catching all of the little issues!

Thomas La Piana and others added 3 commits January 23, 2022 10:28
Copy link
Contributor

@SteveDMurphy SteveDMurphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on another change with wide impacts @ThomasLaPiana ! Really like the upgrades around organization with the CLI as well 🙌🏽 the only things I added were some thoughts around docs and it looks like Phil has an open question or two as well

docs/fides/docs/installation/configuration.md Show resolved Hide resolved
docs/fides/docs/tutorial/add.md Show resolved Hide resolved
Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>
@ThomasLaPiana ThomasLaPiana merged commit 9777ae7 into main Jan 25, 2022
@ThomasLaPiana ThomasLaPiana deleted the ThomasLaPiana-fides-init branch January 25, 2022 16:51
@iamkelllly iamkelllly added this to the fidesctl 1.3.0 milestone Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate CLI user input Add a fidesctl init command
4 participants