gh-63161: Fix tokenize.detect_encoding() #139446

serhiy-storchaka · 2025-09-30T10:00:00Z

Support non-UTF-8 shebang and comments if non-UTF-8 encoding is specified.
Detect decoding error for non-UTF-8 encoding.
Detect null bytes in source code.

Issue: Non-UTF8 encoding line #63161

* Support non-UTF-8 shebang and comments if non-UTF-8 encoding is specified. * Detect decoding error for non-UTF-8 encoding. * Detect null bytes in source code.

serhiy-storchaka · 2025-09-30T10:07:20Z

It is a draft until we fix the Python interpreter.

miss-islington-app · 2025-10-20T17:09:10Z

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

* Support non-UTF-8 shebang and comments if non-UTF-8 encoding is specified. * Detect decoding error for non-UTF-8 encoding. * Detect null bytes in source code. (cherry picked from commit 38d4b43) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

bedevere-app · 2025-10-20T17:09:24Z

GH-140378 is a backport of this pull request to the 3.14 branch.

* Support non-UTF-8 shebang and comments if non-UTF-8 encoding is specified. * Detect decoding error for non-UTF-8 encoding. * Detect null bytes in source code. (cherry picked from commit 38d4b43) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

bedevere-bot · 2025-10-20T18:36:23Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 CentOS9 NoGIL Refleaks 3.x (tier-1) has failed when building commit 38d4b43.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1610/builds/2273) and take a look at the build logs.
Check if the failure is related to this commit (38d4b43) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1610/builds/2273

Failed tests:

test_free_threading

Test leaking resources:

test_free_threading: file descriptors
test_free_threading: memory blocks

Summary of the results of the build (if available):

==

Click to see traceback logs

remote: Enumerating objects: 11, done.        
remote: Counting objects:  11% (1/9)        
remote: Counting objects:  22% (2/9)        
remote: Counting objects:  33% (3/9)        
remote: Counting objects:  44% (4/9)        
remote: Counting objects:  55% (5/9)        
remote: Counting objects:  66% (6/9)        
remote: Counting objects:  77% (7/9)        
remote: Counting objects:  88% (8/9)        
remote: Counting objects: 100% (9/9)        
remote: Counting objects: 100% (9/9), done.        
remote: Compressing objects:  16% (1/6)        
remote: Compressing objects:  33% (2/6)        
remote: Compressing objects:  50% (3/6)        
remote: Compressing objects:  66% (4/6)        
remote: Compressing objects:  83% (5/6)        
remote: Compressing objects: 100% (6/6)        
remote: Compressing objects: 100% (6/6), done.        
remote: Total 11 (delta 3), reused 3 (delta 3), pack-reused 2 (from 2)        
From https://github.com/python/cpython
 * branch                    main       -> FETCH_HEAD
Note: switching to '38d4b436ca767351db834189b3a5379406cd52a8'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 38d4b436ca7 gh-63161: Fix tokenize.detect_encoding() (GH-139446)
Switched to and reset branch 'main'

configure: WARNING: no system libmpdec found; falling back to pure-Python version for the decimal module

make: *** [Makefile:2489: buildbottest] Error 2

bedevere-bot · 2025-10-20T18:43:48Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 FreeBSD Refleaks 3.x (tier-3) has failed when building commit 38d4b43.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1613/builds/2204) and take a look at the build logs.
Check if the failure is related to this commit (38d4b43) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1613/builds/2204

Test leaking resources:

test_events: references
test_events: memory blocks

Summary of the results of the build (if available):

==

Click to see traceback logs

Traceback (most recent call last):
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/support/__init__.py", line 847, in gc_collect
    gc.collect()
    ~~~~~~~~~~^^
ResourceWarning: unclosed <socket.socket fd=9, family=2, type=1, proto=6, laddr=('127.0.0.1', 12865), raddr=('127.0.0.1', 12866)>
Task was destroyed but it is pending!
task: <Task pending name='Task-4087' coro=<BaseSelectorEventLoop._accept_connection2() done, defined at /buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/asyncio/selector_events.py:217> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Warning -- Unraisable exception
Exception ignored while calling deallocator <function _SelectorTransport.__del__ at 0x83ecdd310>:
Traceback (most recent call last):
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/asyncio/selector_events.py", line 873, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ResourceWarning: unclosed transport <_SelectorSocketTransport closing fd=9>
k


Traceback (most recent call last):
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/support/__init__.py", line 847, in gc_collect
    gc.collect()
    ~~~~~~~~~~^^
ResourceWarning: unclosed <socket.socket fd=9, family=2, type=1, proto=6, laddr=('127.0.0.1', 20768), raddr=('127.0.0.1', 20769)>
Task was destroyed but it is pending!
task: <Task pending name='Task-1035' coro=<BaseSelectorEventLoop._accept_connection2() done, defined at /buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/asyncio/selector_events.py:217> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Warning -- Unraisable exception
Exception ignored while calling deallocator <function _SelectorTransport.__del__ at 0x83a6c6810>:
Traceback (most recent call last):
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/asyncio/selector_events.py", line 873, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ResourceWarning: unclosed transport <_SelectorSocketTransport closing fd=9>
k

pythongh-63161: Fix tokenize.detect_encoding()

c8c0b20

* Support non-UTF-8 shebang and comments if non-UTF-8 encoding is specified. * Detect decoding error for non-UTF-8 encoding. * Detect null bytes in source code.

serhiy-storchaka requested a review from vstinner September 30, 2025 10:00

bedevere-app bot mentioned this pull request Sep 30, 2025

Non-UTF8 encoding line #63161

Closed

Merge branch 'main' into tokenize-detect_encoding

95fd24e

serhiy-storchaka marked this pull request as ready for review October 13, 2025 12:36

serhiy-storchaka requested review from lysnikolaou and pablogsal as code owners October 13, 2025 12:36

bedevere-app bot added the awaiting core review label Oct 13, 2025

serhiy-storchaka merged commit 38d4b43 into python:main Oct 20, 2025
47 checks passed

bedevere-app bot removed the awaiting core review label Oct 20, 2025

serhiy-storchaka deleted the tokenize-detect_encoding branch October 20, 2025 17:08

serhiy-storchaka added the needs backport to 3.14 bugs and security fixes label Oct 20, 2025

bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-63161: Fix tokenize.detect_encoding() #139446

gh-63161: Fix tokenize.detect_encoding() #139446

Uh oh!

serhiy-storchaka commented Sep 30, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

serhiy-storchaka commented Sep 30, 2025

Uh oh!

Uh oh!

miss-islington-app bot commented Oct 20, 2025

Uh oh!

bedevere-app bot commented Oct 20, 2025

Uh oh!

bedevere-bot commented Oct 20, 2025

Uh oh!

bedevere-bot commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

gh-63161: Fix tokenize.detect_encoding() #139446

gh-63161: Fix tokenize.detect_encoding() #139446

Uh oh!

Conversation

serhiy-storchaka commented Sep 30, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Sep 30, 2025

Uh oh!

Uh oh!

miss-islington-app bot commented Oct 20, 2025

Uh oh!

bedevere-app bot commented Oct 20, 2025

Uh oh!

bedevere-bot commented Oct 20, 2025

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Uh oh!

bedevere-bot commented Oct 20, 2025

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

serhiy-storchaka commented Sep 30, 2025 •

edited by bedevere-app bot

Loading