Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: .pth files generated by PyTorch are incompatible with path configuration file handling #300

Closed
qthequartermasterman opened this issue Jul 3, 2024 · 3 comments · Fixed by #301
Assignees
Labels
bug Something isn't working

Comments

@qthequartermasterman
Copy link
Contributor

Description of the bug

According to the official PyTorch saving and loading guide, it is a "common PyTorch convention is to save models using either a .pt or .pth file extension."

This is incompatible with the path configuration files, which also use the .pth extension, that have special handling in griffe.

To Reproduce

git clone https://github.com/qthequartermasterman/griffe-pytorch-minimal-reproducible-example
cd griffe-pytorch-minimal-reproducible-example
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python build-pytorch-state-dict.py  # This script will create a `state_dict.pth` file
mkdocs build  # This will error.

Full traceback

Full traceback
ERROR   -  Error reading page 'reference/my-package/index.md': 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte
Traceback (most recent call last):
  File "/home/my-username/.conda/envs/my-env310/bin/mkdocs", line 8, in <module>
    sys.exit(cli())
  File "/home/my-username/.conda/envs/my-env310/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/my-username/.conda/envs/my-env310/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/my-username/.conda/envs/my-env310/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/my-username/.conda/envs/my-env310/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/my-username/.conda/envs/my-env310/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/__main__.py", line 286, in build_command
    build.build(cfg, dirty=not clean)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/commands/build.py", line 322, in build
    _populate_page(file.page, config, files, dirty)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/commands/build.py", line 175, in _populate_page
    page.render(config, files)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/structure/pages.py", line 271, in render
    self.content = md.convert(self.markdown)
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/core.py", line 357, in convert
    root = self.parser.parseDocument(self.lines).getroot()
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/blockparser.py", line 117, in parseDocument
    self.parseChunk(self.root, '\n'.join(lines))
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/blockparser.py", line 136, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/blockparser.py", line 158, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocstrings/extension.py", line 124, in run
    html, handler, data = self._process_block(identifier, block, heading_level)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocstrings/extension.py", line 206, in _process_block
    data: CollectorItem = handler.collect(identifier, options)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocstrings_handlers/python/handler.py", line 270, in collect
    loader = GriffeLoader(
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/loader.py", line 85, in __init__
    self.finder: ModuleFinder = ModuleFinder(search_paths)
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/finder.py", line 105, in __init__
    self._extend_from_pth_files()
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/finder.py", line 379, in _extend_from_pth_files
    for directory in _handle_pth_file(item):
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/finder.py", line 443, in _handle_pth_file
    for line in path.read_text(encoding="utf8").strip().replace(";", "\n").splitlines(keepends=False):
  File "/home/my-username/.conda/envs/my-env310/lib/python3.10/pathlib.py", line 1133, in read_text
    return f.read()
  File "/home/my-username/.conda/envs/my-env310/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte
$ griffe dump ... -LDEBUG
PASTE LOGS HERE
$ mkdocs build -v
DEBUG   -  Loading configuration file: /home/my-username/griffe-pytorch-minimal-reproducible-example/mkdocs.yml
DEBUG   -  Loaded theme configuration for 'mkdocs' from
           '/home/my-username/.local/lib/python3.10/site-packages/mkdocs/themes/mkdocs/mkdocs_theme.yml': {'static_templates': ['404.html'],
           'locale': 'en', 'include_search_page': False, 'search_index_only': False, 'highlightjs': True, 'hljs_languages': [],
           'hljs_style': 'github', 'navigation_depth': 2, 'nav_style': 'primary', 'analytics': {'gtag': None}, 'shortcuts': {'help':
           191, 'next': 78, 'previous': 80, 'search': 83}}
DEBUG   -  Config value 'config_file_path' = '/home/my-username/griffe-pytorch-minimal-reproducible-example/mkdocs.yml'
DEBUG   -  Config value 'site_name' = 'My Docs'
DEBUG   -  Config value 'nav' = None
DEBUG   -  Config value 'pages' = None
DEBUG   -  Config value 'exclude_docs' = None
DEBUG   -  Config value 'not_in_nav' = None
DEBUG   -  Config value 'site_url' = None
DEBUG   -  Config value 'site_description' = None
DEBUG   -  Config value 'site_author' = None
DEBUG   -  Config value 'theme' = Theme(name='mkdocs', dirs=['/home/my-username/.local/lib/python3.10/site-packages/mkdocs/themes/mkdocs',
           '/home/my-username/.local/lib/python3.10/site-packages/mkdocs/templates'], static_templates={'sitemap.xml', '404.html'},
           name='mkdocs', locale=Locale(language='en', territory=''), include_search_page=False, search_index_only=False,
           highlightjs=True, hljs_languages=[], hljs_style='github', navigation_depth=2, nav_style='primary', analytics={'gtag': None},
           shortcuts={'help': 191, 'next': 78, 'previous': 80, 'search': 83})
DEBUG   -  Config value 'docs_dir' = '/home/my-username/griffe-pytorch-minimal-reproducible-example/docs'
DEBUG   -  Config value 'site_dir' = '/home/my-username/griffe-pytorch-minimal-reproducible-example/site'
DEBUG   -  Config value 'copyright' = None
DEBUG   -  Config value 'google_analytics' = None
DEBUG   -  Config value 'dev_addr' = _IpAddressValue(host='127.0.0.1', port=8000)
DEBUG   -  Config value 'use_directory_urls' = True
DEBUG   -  Config value 'repo_url' = None
DEBUG   -  Config value 'repo_name' = None
DEBUG   -  Config value 'edit_uri_template' = None
DEBUG   -  Config value 'edit_uri' = None
DEBUG   -  Config value 'extra_css' = []
DEBUG   -  Config value 'extra_javascript' = []
DEBUG   -  Config value 'extra_templates' = []
DEBUG   -  Config value 'markdown_extensions' = ['toc', 'tables', 'fenced_code']
DEBUG   -  Config value 'mdx_configs' = {}
DEBUG   -  Config value 'strict' = False
DEBUG   -  Config value 'remote_branch' = 'gh-pages'
DEBUG   -  Config value 'remote_name' = 'origin'
DEBUG   -  Config value 'extra' = {}
DEBUG   -  Config value 'plugins' = {'search': <mkdocs.contrib.search.SearchPlugin object at 0x7f1fa52f09a0>, 'autorefs':
           <mkdocs_autorefs.plugin.AutorefsPlugin object at 0x7f1fa52f0b80>, 'mkdocstrings': <mkdocstrings.plugin.MkdocstringsPlugin
           object at 0x7f1fa52f37c0>}
DEBUG   -  Config value 'hooks' = {}
DEBUG   -  Config value 'watch' = []
DEBUG   -  Config value 'validation' = {'nav': {'omitted_files': 20, 'not_found': 30, 'absolute_links': 20}, 'links': {'not_found': 30,
           'absolute_links': 20, 'unrecognized_links': 20}}
DEBUG   -  Running 3 `config` events
DEBUG   -  mkdocs_autorefs: Adding AutorefsExtension to the list
DEBUG   -  mkdocstrings: Adding extension to the list
DEBUG   -  mkdocstrings: Picked up existing autorefs instance <mkdocs_autorefs.plugin.AutorefsPlugin object at 0x7f1fa52f0b80>
DEBUG   -  Running 1 `pre_build` events
INFO    -  Cleaning site directory
INFO    -  Building documentation to directory: /home/my-username/griffe-pytorch-minimal-reproducible-example/site
DEBUG   -  Reading markdown pages.
DEBUG   -  Reading: index.md
DEBUG   -  Running 1 `page_markdown` events
DEBUG   -  mkdocstrings: Matched '::: foo.Bar.baz'
DEBUG   -  mkdocstrings: Using handler 'python'
DEBUG   -  mkdocstrings: Collecting data
ERROR   -  Error reading page 'index.md': 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte
Traceback (most recent call last):
  File "/home/my-username/.conda/envs/griffe-mre/bin/mkdocs", line 8, in <module>
    sys.exit(cli())
  File "/home/my-username/.conda/envs/griffe-mre/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/my-username/.conda/envs/griffe-mre/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/my-username/.conda/envs/griffe-mre/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/my-username/.conda/envs/griffe-mre/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/my-username/.conda/envs/griffe-mre/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/__main__.py", line 286, in build_command
    build.build(cfg, dirty=not clean)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/commands/build.py", line 322, in build
    _populate_page(file.page, config, files, dirty)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/commands/build.py", line 175, in _populate_page
    page.render(config, files)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocs/structure/pages.py", line 271, in render
    self.content = md.convert(self.markdown)
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/core.py", line 357, in convert
    root = self.parser.parseDocument(self.lines).getroot()
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/blockparser.py", line 117, in parseDocument
    self.parseChunk(self.root, '\n'.join(lines))
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/blockparser.py", line 136, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/home/my-username/.local/lib/python3.10/site-packages/markdown/blockparser.py", line 158, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocstrings/extension.py", line 124, in run
    html, handler, data = self._process_block(identifier, block, heading_level)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocstrings/extension.py", line 206, in _process_block
    data: CollectorItem = handler.collect(identifier, options)
  File "/home/my-username/.local/lib/python3.10/site-packages/mkdocstrings_handlers/python/handler.py", line 270, in collect
    loader = GriffeLoader(
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/loader.py", line 85, in __init__
    self.finder: ModuleFinder = ModuleFinder(search_paths)
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/finder.py", line 105, in __init__
    self._extend_from_pth_files()
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/finder.py", line 379, in _extend_from_pth_files
    for directory in _handle_pth_file(item):
  File "/home/my-username/.local/lib/python3.10/site-packages/griffe/finder.py", line 443, in _handle_pth_file
    text = path.read_text(encoding="utf8")
  File "/home/my-username/.conda/envs/griffe-mre/lib/python3.10/pathlib.py", line 1135, in read_text
    return f.read()
  File "/home/my-username/.conda/envs/griffe-mre/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

Expected behavior

Griffe should ignore PyTorch .pth files instead of trying to parse them as site configuration files.

Environment information

$ griffe --debug-info  # | xclip -selection clipboard
- __System__: Linux-5.15.0-113-generic-x86_64-with-glibc2.31
- __Python__: cpython 3.10.0
- __Environment variables__:
- __Installed packages__:
  - `griffe` v0.40.1

Note that I could reproduce this issue on any version of python so long as the pytorch .pth file was present. As soon as the .pth file was gone, mkdocs would build properly.

Additional context

@qthequartermasterman qthequartermasterman added the unconfirmed This bug was not reproduced yet label Jul 3, 2024
qthequartermasterman added a commit to qthequartermasterman/griffe that referenced this issue Jul 3, 2024
@pawamoy pawamoy added bug Something isn't working and removed unconfirmed This bug was not reproduced yet labels Jul 3, 2024
@pawamoy
Copy link
Member

pawamoy commented Jul 3, 2024

Hi @qthequartermasterman, thank you for the bug report!

I don't think settling on .pth as a convention for the extension was a good idea 🤔 But anyway, here it is. What does the h stand for, out of curiosity?

Thanks for the PR, that seems like the most reasonable thing to do.

pawamoy pushed a commit that referenced this issue Jul 3, 2024
@qthequartermasterman
Copy link
Contributor Author

qthequartermasterman commented Jul 3, 2024

Hi @qthequartermasterman, thank you for the bug report!

I don't think settling on .pth as a convention for the extension was a good idea 🤔 But anyway, here it is. What does the h stand for, out of curiosity?

Thanks for the PR, that seems like the most reasonable thing to do.

Agreed, concerning not being a good idea. Unfortunately, it's the state the PyTorch ecosystem is in. And I think there isn't much movement because the collision is a problem so rarely. I only learned about path configuration files this week while debugging this issue in one of my repos. I personally plan on avoiding using the "pth" extension moving forward.

There's been an issue open for many years to standardize on ".pt".

pytorch/pytorch#14864

I've also wondered what the h stands for, but never found anything definitive. I think it's for PyTorcH.

@pawamoy
Copy link
Member

pawamoy commented Jul 3, 2024

I see, thanks for the additional context!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants