-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix parameter exposure in error emails #2809
Conversation
… parameters in task repr or body of error emails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution @steveims !
A couple things:
- Are there any unittests you can add to validate in tests that the error email body and subject no longer include non-public params?
- (See comment below) If non-public parameters are changed to
insignificant
, does your error email subject resolve?
Thanks!
luigi/task.py
Outdated
@@ -544,7 +544,7 @@ def __repr__(self): | |||
repr_parts = [] | |||
param_objs = dict(params) | |||
for param_name, param_value in param_values: | |||
if param_objs[param_name].significant: | |||
if param_objs[param_name].significant and param_objs[param_name].visibility == ParameterVisibility.PUBLIC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may bring up a valid question. Should significant
parameters be allowed to be anything other than public
?
I'm inclined to think if a parameter is defined as significant, it should be included in its task's __repr__
.
@nryanov You added the parameter visibility feature. What are your thoughts on my comment above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I'm not a heavy user of Luigi, but so far I'd be fine with a boolean on parameters to control whether they are exposed (logs, email, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you validate if adding significant=False
to the Parameter instantiation correctly removes it from the error email's subject?
The docstring for this parameter configuration continues to lead me to believe that non-public parameters should not be significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delayed answer
@dlstadther this is a good question. I think that significant parameters may be public, hidden and private in terms of significance. Maybe we should reconsider current description of 'significant' and declare it as flag important only for task_id generation. If significant
will be important only for task_id generation, then implementation of Task.__repr__
will be easier as it will be able to include only public
parameters.
But currently, parameters significance could control its exposure in some cases and that seems to be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @nryanov said makes a lot of sense. However I think we've misused __repr__
here. It should probably show all parameters. The issue however here is that we've totally misused it when we put it in the email header.
Maybe the email header could use str()
instead of repr()
. And we would implement
def __str__() = return self._get_task_as_string(show_only_public_parameters=True)
and
def __repr__() = return self._get_task_as_string()
maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it... using str()
at that call site would also not be explicit, there we probably want to say we want it to be "give me string representation with PUBLIC params only' somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this patch @steveims. And I'm sorry credentials was exposed in the email.
Can you remove the 2nd change from this patch in this PR and add a unit test for only he remaining changes?
Separately we should take care of the issue that @dlstadther raised
luigi/task.py
Outdated
@@ -544,7 +544,7 @@ def __repr__(self): | |||
repr_parts = [] | |||
param_objs = dict(params) | |||
for param_name, param_value in param_values: | |||
if param_objs[param_name].significant: | |||
if param_objs[param_name].significant and param_objs[param_name].visibility == ParameterVisibility.PUBLIC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @nryanov said makes a lot of sense. However I think we've misused __repr__
here. It should probably show all parameters. The issue however here is that we've totally misused it when we put it in the email header.
Maybe the email header could use str()
instead of repr()
. And we would implement
def __str__() = return self._get_task_as_string(show_only_public_parameters=True)
and
def __repr__() = return self._get_task_as_string()
maybe.
Thank you for the comments. Giving the caller explicit control over parameter exposure makes sense. Not sure how many places that might affect? (I'd prefer to know that secrets (passwords, etc) will not appear anywhere (logs, email, etc).) Apologies for not providing tests. I'm not familiar with luigi; I was just asked by our team to solve this problem. I'd be happy to add tests if someone could suggest them. |
@steveims can you refactor this pull request as I previously asked? (remove changes to task.py and add a test for the other change) We have a contributing file about testing: https://github.com/spotify/luigi/blob/master/CONTRIBUTING.rst |
@Tarrasch I removed changes to task.py. Apologies, but I will not be able to submit any tests. I'm not familiar with Luigi and not able to learn right now. Please advise if you'd rather that I close this PR and open an issue instead. |
Feel free to take your time on writing the test. But we do expect that on all changes to core parts of this repository. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions. |
A task error results in an email sent to associated addresses. The email currently includes non-public parameters in the subject and body. That was a problem for us because our HIDDEN and PRIVATE parameters (e.g. passwords) were exposed.
We made two changes to fix the problem:
luigi/luigi/worker.py
Line 686 in 865cc4e
Note: #2 makes sense to us because we never want HIDDEN or PRIVATE parameters exposed (e.g. logs, etc).
We've been running this change in production for two weeks: error emails are now as desired and not seeing any regression.