-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fides init
command
#313
Conversation
@PSalant726 I'm looking at the places we have user input now and I'm trying to determine where/how we should sanitize. Most of the click commands have automatic type checks (manifest dir is checked to be a valid path) but for the strings, nothing is getting executed anywhere (fides_key gets used to make API calls, but never touches something like a db) This is pretty out of my depth so I'm looking for some guidance here on how to shore things up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ThomasLaPiana The inspiration behind #315 was database connection strings and other arbitrary string input, for example:
fides/fidesctl/src/fidesctl/cli/cli.py
Lines 127 to 128 in 287455a
@click.argument("connection_string", type=str) | |
@click.argument("output_filename", type=str) |
Here I would expect that the
connection_string
argument is validated by ensuring 1) that it's a valid URL-like string, and 2) that it can actually be used to create a DB connection. The output_filename
argument should also be validated to ensure it's a valid file descriptor (doesn't include spaces, etc). I'm seeing similar treatment in the scan
and annotate_dataset
handlers.
There are a few other places where we accept arbitrary strings as command options and then pass them immediately to function calls. It's not clear to me if this is a potential attack vector, or if click
is going to automatically prevent something scary (like arbitrary code execution attacks). If we want to be extra careful, we could define a custom type that we accept instead of simply str
, which first matches against a simple regex like ^[A-z0-9].$
, but this might be overly cautious.
Also, FYI, it looks like the config_path
argument can be removed from the ping
handler.
@PSalant726 the connection string is passed to SQLAlchemy, which will throw errors immediately if it is not correctly formed. After that, a test As for arbitrary code execution, I'm also not sure where that would/could happen. I checked around online for user input sanitization in click/python which doesn't really seem to exist, which leads me to believe its not really an issue here. As for checking output filenames, I don't really think thats on us. As an example, spaces are completely fine in Windows filenames, so we'd then be going down a path of platform-specific checking. I think its up to the user to provide a valid filename, otherwise they'll get an error when the code breaks. |
@PSalant726 Almost all of the tools that I've used personally auto-generate a full config file for you and then let you decide what you want to edit/delete etc. I don't see the user having to put in key/value arguments to the command as a better user experience than editing a toml file. I also don't think suggesting any kind of file structure is needed either. I prefer to let users organize their files as they see fit For now it creates the |
I had no idea this happened automatically - that's fantastic!
I appreciate you digging into this! For me it's the difference between a potential attack vector and an actual vulnerability. It might be that it would only represent an issue if we took the input string and passed it right to
Yea, after doing some more research it seems like this is a known unsolved problem in Python. Lame. Do you think whatever error might get raised would be helpful, or cryptic? It might just be that we can't do much to help here 😕
Good point. So maybe we only include the properties that are absolutely required, and point users to the documentation as part of the output (like you're already doing)? I think it eliminates the question of what to exclude because it wouldn't be some arbitrary subset of config options, it would only be the optional ones.
For me, the cost/benefit analysis is: what do users gain from the flexibility vs. what do we as maintainers lose by allowing it? If we're going to try and get all fides tools onto the same standard of config/manifest file management, then enforcing some more structure feels very valuable. I'm not sure what users gain from the flexibility, and it might lead them into messy manifest management practices. Thoughts?
Do you have any ideas about what else might be included? Is there anything worth adding now? |
I dug into Click a little more and found some good options for validating user input, such as this, that being said, the hard part is now deciding what to clean. Is there a general best practice for sanitization?
It looks like the
I'm still of the opinion here that it isn't up to us to determine how a user should structure their stuff. When you give fidesctl a manifest file, it navigates the entire directory tree and attempts to load every yml file it finds. This means the current supported structure is completely arbitrary. Additionally, this will return a normalized
For sure, you yourself gave really good examples in the issue. Giving people the option to jump right into generating manifests seems like a logical next step once we support policy/system generation as well as datasets. We could also point to more docs here.
Yeah I was on the fence between include/exclude, but landed on exclude so that anything new we added would be automatically included. I have preferred in the past to browse through a full config file just to see what we could configure, so here I just left off the things that we automatically set at config runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a general best practice for sanitization?
Sanitize everything you can? We don't want to create any pain points, but we also want to lock down as much as is reasonable. It's always a compromise.
It looks like the
click.Path()
type we're already using implements this, so we should be good to go here. I can also add theexists
flag formanifest_dir
and it will check the directory actually exists before executing it. I'll also updateoutput_dir
with the new type validation
Sweet! It's awesome that this is included - it makes things very clean.
When you give fidesctl a manifest file, it navigates the entire directory tree and attempts to load every yml file it finds. This means the current supported structure is completely arbitrary.
Part of the long-term goal of standardizing on a more structured .fides
directory would be to eliminate some of this complexity when it isn't needed. If you're saying the current solution is good enough, we could always revisit this later if it becomes a problem.
Yeah I was on the fence between include/exclude, but landed on exclude so that anything new we added would be automatically included.
If this was the heart of your question (vs. the specifics of what should be included), then I completely missed it 🙈. In general I think it's safer to create an explicit allowlist, in case we add anything potentially unsafe in the future? Having been working on the auth changes, I'm thinking of some of the auth-related config options. Fidesops implements a get_censored_config
function for this reason, but if we're only going to auto-populate generated fidesctl.toml
files with dummy values, then maybe it's not a big deal?
Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good! Nothing major at this point. Separately though, I think it makes more sense to remove the analytics-related changes until we open a PR to add analytics/opt-out support. IMO the changes belong in that diff, and with that context. Also, we don't have the "approved" copy yet.
this was a sloppy friday afternoon PR, sorry about that, thanks for catching all of the little issues! |
Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>
Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>
Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>
…fides into ThomasLaPiana-fides-init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on another change with wide impacts @ThomasLaPiana ! Really like the upgrades around organization with the CLI as well 🙌🏽 the only things I added were some thoughts around docs and it looks like Phil has an open question or two as well
Co-authored-by: Phil Salant <PSalant726@users.noreply.github.com>
Closes #276
Closes #315
Code Changes
init
command that creates a.fidesctl
directory.fidesctl/config.toml
)manifest_dir
to be.fides
, sofidesctl apply
etc. are valid without providing an explicit directorySteps to Confirm
fidesctl init
locallyPre-Merge Checklist
Description Of Changes
This PR adds the
fides init
command as a way to do the following:Additionally, the structure of the CLI has been reworked so that it isn't just one file with every command, but instead have been broken out into three distinct files.
This PR also does some more validation of user input where possible