Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix parameter exposure in error emails #2809

Closed

Conversation

steveims
Copy link

A task error results in an email sent to associated addresses. The email currently includes non-public parameters in the subject and body. That was a problem for us because our HIDDEN and PRIVATE parameters (e.g. passwords) were exposed.

We made two changes to fix the problem:

  1. serialize only PUBLIC parameters to the body of the error email
  2. serialized only significant and PUBLIC parameters in the task representation, which is used to generate the subject for error emails (e.g.
    subject="Luigi: {task} failed scheduling. Host: {host}",
    )

Note: #2 makes sense to us because we never want HIDDEN or PRIVATE parameters exposed (e.g. logs, etc).

We've been running this change in production for two weeks: error emails are now as desired and not seeing any regression.

… parameters in task repr or body of error emails
Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @steveims !

A couple things:

  1. Are there any unittests you can add to validate in tests that the error email body and subject no longer include non-public params?
  2. (See comment below) If non-public parameters are changed to insignificant, does your error email subject resolve?

Thanks!

luigi/task.py Outdated
@@ -544,7 +544,7 @@ def __repr__(self):
repr_parts = []
param_objs = dict(params)
for param_name, param_value in param_values:
if param_objs[param_name].significant:
if param_objs[param_name].significant and param_objs[param_name].visibility == ParameterVisibility.PUBLIC:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may bring up a valid question. Should significant parameters be allowed to be anything other than public?
I'm inclined to think if a parameter is defined as significant, it should be included in its task's __repr__.

@nryanov You added the parameter visibility feature. What are your thoughts on my comment above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm not a heavy user of Luigi, but so far I'd be fine with a boolean on parameters to control whether they are exposed (logs, email, etc).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you validate if adding significant=False to the Parameter instantiation correctly removes it from the error email's subject?

The docstring for this parameter configuration continues to lead me to believe that non-public parameters should not be significant.

Copy link
Contributor

@nryanov nryanov Oct 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delayed answer

@dlstadther this is a good question. I think that significant parameters may be public, hidden and private in terms of significance. Maybe we should reconsider current description of 'significant' and declare it as flag important only for task_id generation. If significant will be important only for task_id generation, then implementation of Task.__repr__ will be easier as it will be able to include only public parameters.

But currently, parameters significance could control its exposure in some cases and that seems to be confusing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my absence here on this item.

I think for the matter of changing the definition of "significant parameters" and representation of a Task's __repr, we should ensure to get feedback from @honnix and @Tarrasch .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @nryanov said makes a lot of sense. However I think we've misused __repr__ here. It should probably show all parameters. The issue however here is that we've totally misused it when we put it in the email header.

Maybe the email header could use str() instead of repr(). And we would implement

def __str__() = return self._get_task_as_string(show_only_public_parameters=True)

and

def __repr__() = return self._get_task_as_string()

maybe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it... using str() at that call site would also not be explicit, there we probably want to say we want it to be "give me string representation with PUBLIC params only' somehow.

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this patch @steveims. And I'm sorry credentials was exposed in the email.

Can you remove the 2nd change from this patch in this PR and add a unit test for only he remaining changes?

Separately we should take care of the issue that @dlstadther raised

luigi/task.py Outdated
@@ -544,7 +544,7 @@ def __repr__(self):
repr_parts = []
param_objs = dict(params)
for param_name, param_value in param_values:
if param_objs[param_name].significant:
if param_objs[param_name].significant and param_objs[param_name].visibility == ParameterVisibility.PUBLIC:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @nryanov said makes a lot of sense. However I think we've misused __repr__ here. It should probably show all parameters. The issue however here is that we've totally misused it when we put it in the email header.

Maybe the email header could use str() instead of repr(). And we would implement

def __str__() = return self._get_task_as_string(show_only_public_parameters=True)

and

def __repr__() = return self._get_task_as_string()

maybe.

@steveims
Copy link
Author

steveims commented Dec 2, 2019

Thank you for the comments.

Giving the caller explicit control over parameter exposure makes sense. Not sure how many places that might affect? (I'd prefer to know that secrets (passwords, etc) will not appear anywhere (logs, email, etc).)

Apologies for not providing tests. I'm not familiar with luigi; I was just asked by our team to solve this problem. I'd be happy to add tests if someone could suggest them.

@Tarrasch
Copy link
Contributor

@steveims can you refactor this pull request as I previously asked? (remove changes to task.py and add a test for the other change)

We have a contributing file about testing: https://github.com/spotify/luigi/blob/master/CONTRIBUTING.rst

@steveims
Copy link
Author

@Tarrasch I removed changes to task.py.

Apologies, but I will not be able to submit any tests. I'm not familiar with Luigi and not able to learn right now.

Please advise if you'd rather that I close this PR and open an issue instead.

@Tarrasch
Copy link
Contributor

Feel free to take your time on writing the test. But we do expect that on all changes to core parts of this repository.

@stale
Copy link

stale bot commented May 15, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

@stale stale bot added the wontfix label May 15, 2020
@stale stale bot closed this May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants