Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sys.__stdout__ for terminal encoding check #15425

Closed
wants to merge 4 commits into from

Conversation

nihir27
Copy link

@nihir27 nihir27 commented Oct 31, 2022

What does this PR do?

It's common to redirect sys.std*. Currently, if you've redirected sys.stdout using a custom class and that class does not have an encoding attribute set, trainer.test can fail with an AttributeError. Since the sys.stdout.encoding check here concerns terminals, I propose using sys.__stdout__ instead.

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

It's common to redirect `sys.std*`. Currently, if you've redirected `sys.stdout` using a custom class and that class does not have an `encoding` attribute set,  `trainer.test` can fail with an `AttributeError`.  Since the `sys.stdout.encoding` check here concerns terminals, I propose using `sys.__stdout__` instead.
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Oct 31, 2022
@awaelchli
Copy link
Contributor

@nihir27 Thanks for sending the PR. Do you have a way to demonstrate this using a small runnable code example? I'm just generally curious.

@awaelchli awaelchli added this to the v1.9 milestone Oct 31, 2022
@awaelchli awaelchli added bug Something isn't working loops Related to the Loop API community This PR is from the community labels Oct 31, 2022
@@ -370,8 +370,8 @@ def _print_results(results: List[_OUT_DICT], stage: str) -> None:

try:
# some terminals do not support this character
if sys.stdout.encoding is not None:
"─".encode(sys.stdout.encoding)
if sys.__stdout__.encoding is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given your motivation, wouldn't it be better to check hasattr(sys.stdout, "encoding")?

After all, sys.stdout is what we will use to print, and __stdout__ is just a reference to the original handle: https://docs.python.org/3/library/sys.html#sys.__stdout__

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasattr(sys.stdout, "encoding") and sys.stdout.encoding is None is an option that prevents an AttributeError being raised but for the case where one is writing to a terminal as well as a log file (like in https://stackoverflow.com/a/14906787) you lose the handling of the character.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That particular pattern may be actually be rare and in that case I would favour just doing hasattr(sys.stdout, "encoding") and sys.stdout.encoding is None.

@nihir27
Copy link
Author

nihir27 commented Oct 31, 2022

@awaelchli For example, below would raise an AttributeError

import sys

class Logger(object):
    def __init__(self):
        self.terminal = sys.__stdout__
        self.log = open("logfile.log", "a")
   
    def write(self, message):
        self.terminal.write(message)
        self.log.write(message)  

    def flush(self):
        pass    

sys.stdout = Logger()

https://stackoverflow.com/a/14906787

@awaelchli
Copy link
Contributor

awaelchli commented Nov 1, 2022

@nihir27 Would you mind adding this example as a unit test case in the file tests/tests_pytorch/trainer/logging_/test_eval_loop_logging.py, similar to the tests there that use EvaluationLoop._print_results? Thanks!

@leoleoasd

This comment was marked as duplicate.

@leoleoasd

This comment was marked as duplicate.

@leoleoasd

This comment was marked as off-topic.

@awaelchli
Copy link
Contributor

@leoleoasd If you find the time, would you mind taking a look at the failing test test_native_print_results_encodings, which is affected given your changes?

@leoleoasd
Copy link
Contributor

Are you mentioning the right person?

@awaelchli
Copy link
Contributor

awaelchli commented Nov 5, 2022

Nope 🤣 Looks like both of us have too many browser pages open. I meant to ping @nihir27 :)

@justusschock justusschock added the waiting on author Waiting on user action, correction, or update label Nov 9, 2022
@awaelchli
Copy link
Contributor

@nihir27 One test is failing here:


    @pytest.mark.parametrize("encoding", ["latin-1", "utf-8"])
    def test_native_print_results_encodings(monkeypatch, encoding):
        import pytorch_lightning.loops.dataloader.evaluation_loop as imports
    
        monkeypatch.setattr(imports, "_RICH_AVAILABLE", False)
    
        out = mock.Mock()
        out.encoding = encoding
        with redirect_stdout(out) as out:
            EvaluationLoop._print_results(*inputs0)
    
        # Attempt to encode everything the file is told to write with the given encoding
        for call_ in out.method_calls:
            name, args, kwargs = call_
            if name == "write":
>               args[0].encode(encoding)
E               UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-119: ordinal not in range(256)

Could you take another look? Let us know if it doesn't work and we can help.

@Borda Borda modified the milestones: v1.9, v1.9.x Jan 16, 2023
@mergify mergify bot added the has conflicts label Feb 1, 2023
@Borda Borda requested a review from williamFalcon as a code owner February 3, 2023 01:37
@mergify mergify bot removed the has conflicts label Feb 3, 2023
@Borda Borda changed the title Use sys.__stdout__ for terminal encoding check Use sys.__stdout__ for terminal encoding check Mar 20, 2023
@Borda
Copy link
Member

Borda commented Apr 24, 2023

@awaelchli, what is the situation here? :)

Copy link

gitguardian bot commented Jan 16, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id Secret Commit Filename
- Generic High Entropy Secret 78fa3af tests/tests_app/utilities/test_login.py View secret
- Base64 Basic Authentication 78fa3af tests/tests_app/utilities/test_login.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@nihir27 nihir27 closed this Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community This PR is from the community has conflicts loops Related to the Loop API pl Generic label for PyTorch Lightning package waiting on author Waiting on user action, correction, or update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants