Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

environment table needs to be whitelisted/blacklisted #233

Closed
fenchu opened this issue Aug 29, 2019 · 19 comments · Fixed by #431
Closed

environment table needs to be whitelisted/blacklisted #233

fenchu opened this issue Aug 29, 2019 · 19 comments · Fixed by #431
Assignees
Labels
enhancement This issue/PR relates to a feature request. feature This issue/PR relates to a feature request.

Comments

@fenchu
Copy link

fenchu commented Aug 29, 2019

when I use pytest-html in docker (with gitlab-ci) it needs a lot of environment variables set, some holds token and passwords. I do not want to show in my jenkins report to a wider audience.
pytest-html_envs

It seems they are collected in the very first test, so adding this to a test is working using the request autofixture:

        # remove eventual gitlab settings
        for k in list(request.config._metadata.keys()):
            if re.search(r'^(GITLAB_|CI_)', k):
                del request.config._metadata[k]

But I like to do it in conftest.py

@BeyondEvil
Copy link
Contributor

BeyondEvil commented Aug 29, 2019

You can remove them using the pytest_metadata hook.

import pytest
def pytest_metadata(metadata):
    metadata.pop("password", None)

Also, make sure you're using the latest version of pytest-metadata (looks like you are 1.8.0), since we've removed some of the env vars that will always contain sensitive data: pytest-dev/pytest-metadata#18

EDIT: Come to think of it, I wonder why those aren't masked for you? 🤔 According the above PR, they should always be.

I'm happy to review a PR (to pytest-metadata) with additional env vars that should be removed.

@BeyondEvil
Copy link
Contributor

BeyondEvil commented Aug 29, 2019

Maybe I was too quick to close this. I wonder why some of those are shown, since they were removed in v1.8.0

@BeyondEvil
Copy link
Contributor

I see why now, that PR was merged after the release of v1.8.0.

The question still stands tho: Why aren't they masked for you? Maybe because they're passed to a docker container and GitLab can't pick that up and mask them. 🤔

Anyway, I'll get started on a release that will solve some of them for you. For the others, you'll have to use the hook.

@krzysztof-pawlik-gat
Copy link
Contributor

@BeyondEvil GitLab passes those variables to jobs as they are, the masked thing is applied when GitLab shows job output only (stdout/stderr). The report itself is generated from the job, hence it has access to raw values (as it's supposed to, how would you make use of an access token with the value masked?). I'm using following snippet to mask some data:

    for key, value in config._metadata.items():  # pylint: disable=protected-access
        try:
            if key.endswith("_TOKEN") or key.endswith("_PASSWORD"):
                config._metadata[key] = Credentials.mask_password(value)  # pylint: disable=protected-access
            elif key.endswith("_URL"):
                parsed_url = urllib.parse.urlparse(value)
                user_part, host_part = urllib.parse.splituser(parsed_url.netloc)
                if user_part and ":" in user_part:
                    user_name, password = user_part.split(":")
                    parsed_url = parsed_url._replace(
                        netloc=f"{user_name}:{Credentials.mask_password(password)}@{host_part}"
                    )
                    config._metadata[key] = parsed_url.geturl()  # pylint: disable=protected-access
        except Exception as exception:  # pylint: disable=broad-except
            logging.warning("Failed to sanitize value for %s: %s", key, exception)

(Credentials.mask_password is a simple method that masks the passed value keeping the length, this snipped uses config._metadata as it's used in pytest_configure and not pytest_metadata, don't ask, historic stuff ;))

Keep in mind that there are useful variables, like GITLAB_USER_EMAIL shows who triggered the pipeline (no need to resolve GITLAB_USER_ID to a user name), CI_PROJECT_URL allows me to quickly visit affected repository (no need to copy & paste CI_PROJECT_PATH). There's a huge difference between anonymizing the report for public show & tell and internal use.

@BeyondEvil
Copy link
Contributor

BeyondEvil commented Aug 29, 2019

@krzysztof-pawlik-gat

GitLab passes those variables to jobs as they are, the masked thing is applied when GitLab shows job output only (stdout/stderr).

As I was saying then: Maybe because they're passed to a docker container and GitLab can't pick that up and mask them.

Keep in mind that there are useful variables, like GITLAB_USER_EMAIL shows who triggered the pipeline (no need to resolve GITLAB_USER_ID to a user name), CI_PROJECT_URL allows me to quickly visit affected repository (no need to copy & paste CI_PROJECT_PATH).

Agreed. I won't remove more than already removed in the linked PR. Unless a very compelling argument is put forth. Especially since, imo, it's very easy customizing what is shown and not using the aforementioned hook.

There's a huge difference between anonymizing the report for public show & tell and internal use.

Yes. Not sure what your point is? 🤔

@krzysztof-pawlik-gat
Copy link
Contributor

Keep in mind that there are useful variables, like GITLAB_USER_EMAIL shows who triggered the pipeline (no need to resolve GITLAB_USER_ID to a user name), CI_PROJECT_URL allows me to quickly visit affected repository (no need to copy & paste CI_PROJECT_PATH).

Agreed. I won't remove more than already removed in the linked PR. Unless a very compelling argument is put forth. Especially since, imo, it's very easy customizing what is shown and not using the aforementioned hook.

Sounds good!

There's a huge difference between anonymizing the report for public show & tell and internal use.

Yes. Not sure what your point is? thinking

The initial report from @fenchu shows some variables masked that in my opinion should not be masked or removed. My intention with this line was that usefulness of the report should not be sacrificed in the name of it being publicly shareable.

@BeyondEvil
Copy link
Contributor

The initial report from @fenchu shows some variables masked that in my opinion should not be masked or removed. My intention with this line was that usefulness of the report should not be sacrificed in the name of it being publicly shareable.

Yeah, I'm even contemplating reverting that PR and going about this is in a different way.

I see a couple of different approaches we can take, my favorite is to implement something similar to pytests filterwarnings where you can list the env vars you want masked in pytest.ini.

What are your thoughts @davehunt ?

@fenchu
Copy link
Author

fenchu commented Aug 29, 2019

Thanks for clarification :-)

Filtering manually is fine.

My initial question was more about doing this from the conftest.py file. if I use request or metadata fixtures they seem to run at the beginning before all my tests. Is there a hook so it can run after the testloop stage before reportgathering?

@BeyondEvil
Copy link
Contributor

BeyondEvil commented Aug 29, 2019

@fenchu
So there's both the metadata fixture and the pytest_metadata hook.

I would say that the fixture is used more to read metadata and the hook is used to modify.

So the hook should give you what you want - filtering out the sensitive data. Are you saying that is not working? 🤔

@davehunt
Copy link
Collaborator

I see a couple of different approaches we can take, my favorite is to implement something similar to pytests filterwarnings where you can list the env vars you want masked in pytest.ini.

What are your thoughts @davehunt ?

This sounds like a reasonable approach to me. Perhaps we need two filters, one for variables to mark and another for variables to exclude completely. We could default to masking anything matching 'token' or 'password' in an effort to protect potentially vulnerable data, but we should be careful not to suggest that we're protecting these values. In fact, we should probably call out in the documentation that there is a risk of exposing secrets when using the plugin.

@gnikonorov gnikonorov added enhancement This issue/PR relates to a feature request. feature This issue/PR relates to a feature request. labels Oct 23, 2020
@gnikonorov
Copy link
Member

Hey @BeyondEvil this sounds like an interesting thing we should implement, I'm going to get started on it.

The approach I'll take is:

I see a couple of different approaches we can take, my favorite is to implement something similar to pytests filterwarnings where you can list the env vars you want masked in pytest.ini.

Am I missing any information to start working on this?

@gnikonorov gnikonorov self-assigned this Dec 15, 2020
@BeyondEvil
Copy link
Contributor

Hey @BeyondEvil this sounds like an interesting thing we should implement, I'm going to get started on it.

Wow, that'd be great!

The approach I'll take is:

I see a couple of different approaches we can take, my favorite is to implement something similar to pytests filterwarnings where you can list the env vars you want masked in pytest.ini.

Excellent!

Am I missing any information to start working on this?

I don't think so.

I'm going back and forth in my head thinking about where this belongs.

But I think I've landed in this:

pytest-metadata shouldn't mask anything, it's impossible to foresee the use-cases consumers of the library may have. They may actually want to get secret values. Besides, there's both a hook and fixture to manipulate the data.

pytest-html should provide one mechanism to redact values and one to remove them completely like Dave suggested.

If this makes sense to you @gnikonorov I will make a release of pytest-metadata where I deprecate the default masking.

pytest-html should make no assumptions of values being masked, I think.

@gnikonorov
Copy link
Member

That makes sense @BeyondEvil. I'll implement this in two PRs then. One for redacting and one for masking.

@BeyondEvil
Copy link
Contributor

Turns out I don't have to do anything with pytest-metadata, it doesn't mask or anything currently. The "secret" environment variables were simple removed from the respective CI vendors list.

@gnikonorov
Copy link
Member

Starting work on the first part now, @BeyondEvil. I'll allow for users to specify a list of environment variables to remove from the table completely ( not mask ).

@BeyondEvil
Copy link
Contributor

You're the best @gnikonorov 💪

@gnikonorov
Copy link
Member

Question though, @BeyondEvil. Why not just have users delete items from the environment table ( e.g.:del session.config._metadata["foo"] )? Is the idea just to have a more convenient interface for users or is it not always possible to delete keys?

@BeyondEvil
Copy link
Contributor

I actually don't know @gnikonorov

It's over a year old and not a request that is coming in very often.

Perhaps your time is better spent on other things? 🤔 I mean, there's already two ways of removing items from metadata, maybe that's enough? Up to you really.

@gnikonorov
Copy link
Member

How about we implement redacting and close this issue once it's in master?

I was thinking we could let the user pass a list of strings/regexes and any metadata key that matches has its value redacted.

We could make it an INI file setting.

Thoughts @BeyondEvil ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue/PR relates to a feature request. feature This issue/PR relates to a feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants