-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change write_table to atomically write to the parsetab file. #184
Conversation
The previous implementation had issues with multiple python processes simultaneously invoking `.yacc(...)` with the same arugments. One of them could be midway through generating a .py file that the other process would attempt to import and then crash with a `SyntaxError`. This diff avoids that problem by instead writing to a tempfile and then atomically renaming it. This means processes starting up at the same time will both do the same work, but that's a lot better than having one of them crash! `flanker` had this issue reported to them in mailgun/flanker#168 and they attempted to work around this by committing the parsetab files that ply generates (mailgun/flanker#188), but this doesn't help if the user is running a different version of ply than the version that flanker generated their parsetab files with (because ply will go ahead and regenerate those files).
Over the years, there has been a lot of confusion surrounding the PLY parsetab.py file and it's proper deployment. I'm going to reject this pull request because I feel that it doesn't really address the actual problem that underlies all of this. Specifically, the parsetab.py file is something that is meant to be generated only once during the deployment of a project. Ideally, it gets created once during package installation and that's it. If it's constantly being recreated on execution, or being recreated by hundreds of processes simultaneously, then something is wrong. Creating the file atomically doesn't address that--instead, it merely hides it. If you are working on a major project that uses PLY, the best way to deal with these kinds of versioning problems is to directly incorporate PLY into your project--not as a external dependency, but by directly copying the 'ply' package directory into your project. PLY is not large--there are only two source files. PLY also changes very rarely--often going years between releases. I'm not adding new features to it. It's also not likely to break in future Python versions. Done properly, everything will just "work" and it won't matter if some user has installed a different version of PLY for some reason (you'll just have to pay careful attention to your import statements involving PLY to make sure they are coming from your package, but that's about it). I would also recommend that any 'setup.py' file import the associated parser modules to force them to create the table files on installation of any package. The only other option would be to use PLY and tell it to not create the table files at all. For small grammars, this is probably fine--table generation is quite fast on modern hardware. |
I am sympathetic to your "we don't want to paper over the root cause" argument. However, the fact is that there is at least 1 library out there that is getting this wrong. Your comment that "there has been a lot of confusion surrounding the PLY parsetab.py file and it's proper deployment" implies that there are probably more libraries that also get this wrong. Have you considered documentation and/or code changes to try to better educate people on how to use this library? Some partially baked ideas not in any particular order:
I'm happy to send in PRs implementing anything we agree upon doing to make this library easier to use correctly. |
Packaging advice concerning table files is already found in the documentation at http://dabeaz.com/ply/ply.html#ply_nn49 In the big picture, it's not feasible for me to think of every possible scenario in which a library might used and to come up with a fix for every possible problem. The writing of table files is a well-known facet of PLY. If it's being executed concurrently in threads or processes, then the user will have to take explicit steps to make sure it's doing what they want (by introducing locking, disabling table files, or taking some other step that's appropriate for that situation). |
Oh cool, I had not seen that. Do you think that should have a note about your recommendation to copy ply into your code to avoid getting hit by the "a new version of ply will want to regenerate the tables" issue? |
This fixes mailgun#206. Before this change, when multiple Python processes are simultaneously doing a `from flanker.addresslib import address`, it's possible for some of them to crash in `ply` code. See dabeaz/ply#184, where I attempted to work around this issue by changing ply. You can see in this comment: dabeaz/ply#184 (comment) that the author of ply suggests to workarounds for this issue: 1. Remove `ply` as a dependency in setup.py and copy the source code of `ply` into `flanker`. 2. Disable writing parsetab files to disk when invoking `yacc`. 2) seemed like the simpler solution to me, so that's what I've done here.
This fixes mailgun#206. Before this change, when multiple Python processes are simultaneously doing a `from flanker.addresslib import address`, it's possible for some of them to crash in `ply` code. See dabeaz/ply#184, where I attempted to work around this issue by changing ply. You can see in this comment: dabeaz/ply#184 (comment) that the author of ply suggests to workarounds for this issue: 1. Remove `ply` as a dependency in setup.py and copy the source code of `ply` into `flanker`. 2. Disable writing parsetab files to disk when invoking `yacc`. 2) seemed like the simpler solution to me, so that's what I've done here.
This fixes mailgun#206. Before this change, when multiple Python processes are simultaneously doing a `from flanker.addresslib import address`, it's possible for some of them to crash in `ply` code. See dabeaz/ply#184, where I attempted to work around this issue by changing ply. You can see in this comment: dabeaz/ply#184 (comment) that the author of ply suggests two workarounds for this issue: 1. Remove `ply` as a dependency in setup.py and copy the source code of `ply` into `flanker`. 2. Disable writing parsetab files to disk when invoking `yacc`. 2) seemed like the simpler solution to me, so that's what I've done here.
The previous implementation had issues with multiple python processes
simultaneously invoking
.yacc(...)
with the same arugments. One of themcould be midway through generating a .py file that the other process
would attempt to import and then crash with a
SyntaxError
. This diffavoids that problem by instead writing to a tempfile and then atomically
renaming it. This means processes starting up at the same time will both
do the same work, but that's a lot better than having one of them crash!
flanker
had this issue reported to them inmailgun/flanker#168 and they attempted to work
around this by committing the parsetab files that ply generates
(mailgun/flanker#188), but this doesn't help if
the user is running a different version of ply than the version that
flanker generated their parsetab files with (because ply will go ahead
and regenerate those files).
To test this, I was able to fairly reliably reproduce this issue with the following command:
I then ran the following command to verify that the fix in this PR works. This command succeeded on 1000 consecutive runs.