Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious warnings when run from read-only filesystem due to old ply version #838

Open
bmihaila-bd opened this issue Dec 10, 2024 · 1 comment

Comments

@bmihaila-bd
Copy link

When the code is run from a read only filesystem (e.g. container) then it produces a couple of warnings to the logs, like these:

WARNING: Couldn't open 'parser.out'. [Errno 30] Read-only file system: '<path>/python3.12/site-packages/spdx_tools/spdx/parser/tagvalue/parser.out'
Generating LALR tables
WARNING: Couldn't create 'spdx_tools.spdx.parser.tagvalue.parsetab'. [Errno 30] Read-only file system: '<path>/python3.12/site-packages/spdx_tools/spdx/parser/tagvalue/parsetab.py'

After some investigation if those warnings are important it turns out they come from the ply dependency that is used to parse TAG-VALUE files. Some more digging into that code showed that it tries to create a cache of the parsing tables and has some rather contrived logic on where to store those between calls.

Some solution ideas to avoid the warnings:
The yacc code takes an outdir parameter that will be used instead of the contrived paths, see:

# yacc.py
def yacc(method='LALR', debug=yaccdebug, module=None, tabmodule=tab_module, start=None,
       check_recursion=True, optimize=False, write_tables=True, debugfile=debug_file,
       outputdir=None, debuglog=None, errorlog=None, picklefile=None):

it could be passed some temporary writable dir when instantiating the parser in:

# tagvalue_parser.py
def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:
    parser = Parser()

However, doing that would require the client calling functions to also pass that parameter along and so on, or use tempfile.mkdtemp() and hope that it uses a writable directory and so on. In any case it complicates things.
After some more digging into the upstream of ply https://github.com/dabeaz/ply it seems that the code is not released anymore on pypi since 2018, https://pypi.org/project/ply/3.11/ but instead it is recommended to copy the github code directly into a project that uses it! (not the best idea but see https://github.com/dabeaz/ply?tab=readme-ov-file#how-to-install-and-use )

It also seems that after the last pypi release the code that causes the above spurious warnings has been removed as the changelog https://github.com/dabeaz/ply/blob/master/CHANGES mentions here:

01/26/20  PLY no longer writes cached table files.  Honestly, the use of
          the cached files made more sense when I was developing PLY on
          my 200Mhz PC in 2001. It's not as much as an issue now. For small
          to medium sized grammars, PLY should be almost instantaneous. 
          If you're working with a large grammar, you can arrange
          to pickle the associated grammar instance yourself if need be.
          The removal of table files eliminated a large number of optional
          arguments to yacc() concerning the names and packages of these files.

So in summary it seems best to update ply to its latest upstream version from Github as that has other fixes too and the previous code did quite some contrived logic to cater to very old and very slow machines.
TLDR: the ply code currently used in this library is very old and should be either updated or even replaced with some alternative LALR parser.

@bmihaila-bd
Copy link
Author

Oh and ply code also seems to cause this issue with not being thread-safe #801 - I'm not 100% sure if updating the library will fix that but to me it seems it is related to how ply does store the parsing tables and tries to read them on next run again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant