Skip to content

[linkcheck] PDF anchor (...pdf#anchor) leads to 'utf-8' codec can't decode byte ... #11041

@goekce

Description

@goekce

Describe the bug

Related to #7694.

Note that the query symbol ? is not required when using an anchor (i.e., #fragment).

How to Reproduce

index.rst:

`link1 <https://wci.llnl.gov/sites/wci/files/2020-08/LLNL-SM-654357.pdf?#page=226>`_
`link2 <https://wci.llnl.gov/sites/wci/files/2020-08/LLNL-SM-654357.pdf#page=226>`_
`link3 <https://docs.python.org/3/whatsnew/3.11.html?#whatsnew311-pep654>`_
`link4 <https://docs.python.org/3/whatsnew/3.11.html#whatsnew311-pep654>`_
$ sphinx-build -b linkcheck . build

Output:

(           index: line    1) ok        https://docs.python.org/3/whatsnew/3.11.html#whatsnew311-pep654
(           index: line    1) redirect  https://docs.python.org/3/whatsnew/3.11.html?#whatsnew311-pep654 - with unknown code to https://docs.python.org/3/whatsnew/3.11.html#whatsnew311-pep654
(           index: line    1) broken    https://wci.llnl.gov/sites/wci/files/2020-08/LLNL-SM-654357.pdf#page=226 - 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte
(           index: line    1) broken    https://wci.llnl.gov/sites/wci/files/2020-08/LLNL-SM-654357.pdf?#page=226 - 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte

Environment Information

Platform:              linux; (Linux-6.0.12-arch1-1-x86_64-with-glibc2.36)
Python version:        3.10.8 (main, Nov  1 2022, 14:18:21) [GCC 12.2.0])
Python implementation: CPython
Sphinx version:        5.3.0
Docutils version:      0.19
Jinja2 version:        3.1.2

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions