-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement changes in #57: improve behaviour of dumped files #59
Conversation
I'm not convinced that mode |
RE: your questions:
|
The idea is that if there is an app that does logging (for example), and some remotely executed code unpickles the log file (e.g. the main app opens it at the start, and gives the code a handle, which is the pickled), currently, after it finishes, the log will only contain its log, as the log from the rest of the program will be deleted. There is probably a better example, but that's what came to my head first. |
What I do in tricky cases that need some working through is to build a more extensive test suite, testing any case that I can think of that it might be used for -- that's what I mean by "play with it a bit". I have yet to do this, as promised above. I won't be convinced of the "right solution", until this code has really been kicked around a good bit, using a bunch of different test cases. You could split the test cases into a separate PR, then we can both work on the tests until we have the results that "feel" right. I'm ok with merging your test case now, and having tests fail until we work out the logic is "correct". |
So you want me to move matsjoyce/dill@90fdf15 to a separate PR? |
Yeah, I think so. That would mean adding any accompanying |
Ok, I'm about halfway through If you can think of any other test cases, please add them as PR or a comment here. |
Starting at line 61, for (3), I chose |
With regard to (3), I'm still not sure that I usually like take the simplest solutions as being the best choices. (1) is simple… but apparently wrong. (2) and (5) are also pretty simple. (3) and (4) are less simple. I guess we'll see. |
After getting this far, I'm liking (2), (5), and (4b)… mainly for the reason that they don't appear to end up in the state where the file has buffer filler I think reasoning out the test cases for |
Also, this has not been mentioned previously… there's a new file mode in python |
only the test cases for |
|
As for other names:
|
I added the test cases for variants (1) - (5) for Going through the "append" cases didn't change my mind, I'm very much in favor of (2), (5), and possibly (4b). I'll wait on your feedback on which cases to go forward with. For me, it seems reasonably clear right now we want at least (2) and (5) -- and not (1) and (3). In short, I'm probably not going to accept the PR as is (using logic 3), but after we close debate, there should be an agreed-upon solution (i.e. option to use (2) or (5) and possibly (4b)… as well as safe mode). |
I set it that if the file does not exist, it keeps the I've fixed the IOError. I like (3) and (5) best. However, maybe (3) should turn to (2) when the file DNE, as this would avoid the heaps of File exists (with same or larger contents):
File exists, but smaller
File DNE
|
I agree with your choice to get rid of the |
I've implemented the above changes, and changes the test, so you can see the kind of output it produces (basically the same, but no |
Ok, that's better… I think I might accept this as is... but I want add some of the other options immediately afterward (hence the delay). I don't think (3) should be the default. It's better than (1), but it's not simple to explain… hence I would probably not choose this as an option at all were I coding it myself. To me, it feels like it might be a good choice for certain use cases… but the level of complexity is high enough that it seems like there might be trouble with corner cases down the road… and that leads to changes that break backward compatibility. I'm looking for an explanation of the entire behavior in a single sentence, like... These are short, and fully describe the behavior of the option… can you come up with something similarly short and simple for (3)? I think (2) might be the best default, with (5) as an option. (2) has some flaws, but at least the behavior is straightforward, and very consistent with python… and one should very quickly understand what to expect in all cases. If I'm not mistaken, (4b) and (3) look like they are now the same, except for the choice of modes and the |
I'll try and add (2) and (5). |
@matsjoyce: that would be great. They should both be fairly straightforward, and with (5) I think you could probably just use a second file handle to read all the data in at load, or write it before the dump. What you have above is a good summary of (3), but "mode which preserves behavior" is not really precise -- for a file handle of Would you be OK with (2) as the default, (5) as the other extreme, and (3) as the in-between? Or are there two flags (and thus 4 cases!) to deal with? (I don't know what I would call those flags.) Using string names of the options is a bad idea; however T/F is good, or if there's a clear gradation |
I like tight little packages! 😃 What are we going to name the safeIO flag? |
I know you have obviously made a good choice with RE: naming for the safeIO flag -- have not thought about it. I think |
Well, I would of thought |
Oh. I tried using plain |
Backward compatibility has been a pain in the behind, but necessary due to the target use in HPC. There are institutional-scale HPC platforms still using python 2.5 as the default. Many of those are being phased out, or have been phased out, but python 2.6 is very very common, as are earlier versions of 3.x. |
Yup. And I just noticed that it doesn't work in python 2.7 either, so Its back to the drawing board for that bit. What I was trying before was (http://stackoverflow.com/a/10352231): import os
f = os.fdopen(os.open(name, os.O_CREAT | os.O_WRONLY), mode) The problem is that: f.name == 3 # fd, not name which will break everything more than >>> f.name = "a.txt"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute 'name' of '_io.TextIOWrapper' objects is not writable |
Yes, I'm aware of that unfortunate
Which just seems half-assed. Or at least, it's a really dark dark corner of the language. Like I said, easy out for now could be to make that file mode unavailable for certain versions of python. |
In python 3, we can do: f.buffer.raw.name = "a.txt" Still working on python 2. |
The only way I think static PyObject *
fill_file_fields(PyFileObject *f, FILE *fp, PyObject *name, char *mode,
int (*close)(FILE *))
{
assert(name != NULL);
assert(f != NULL);
assert(PyFile_Check(f));
assert(f->f_fp == NULL);
Py_DECREF(f->f_name);
Py_DECREF(f->f_mode);
Py_DECREF(f->f_encoding);
Py_INCREF(name);
f->f_name = name;
f->f_mode = PyString_FromString(mode);
f->f_close = close;
f->f_softspace = 0;
f->f_binary = strchr(mode,'b') != NULL;
f->f_buf = NULL;
f->f_univ_newline = (strchr(mode, 'U') != NULL);
f->f_newlinetypes = NEWLINE_UNKNOWN;
f->f_skipnextlf = 0;
Py_INCREF(Py_None);
f->f_encoding = Py_None;
if (f->f_mode == NULL)
return NULL;
f->f_fp = fp;
f = dircheck(f);
return (PyObject *) f;
} As this shows, the only way to set |
https://github.com/albertz/pydbattach/blob/master/pythonhdr.py#L181 This has had the same interface since python2.3, but not available in python 3. |
This works: import os
>>> f = os.fdopen(os.open("a.txt", os.O_CREAT | os.O_WRONLY), "w")
>>> f
<open file '<fdopen>', mode 'w' at 0x7f0694f76540>
>>> import ctypes
>>> fname_offset = ctypes.sizeof(ctypes.c_size_t) + ctypes.sizeof(ctypes.py_object) + ctypes.sizeof(ctypes.c_voidp)
>>> fname_offset
24
>>> ctypes.py_object.from_address(id(f) + fname_offset)
py_object('<fdopen>')
>>> n=ctypes.py_object.from_address(id(f) + fname_offset)
>>> n.value="a"
>>> f
<open file 'a', mode 'w' at 0x7f0694f76540>
>>> Not very portable for other python implementations though. |
if not HAS_CTYPES: | ||
raise RuntimeError("Need ctypes to set file name") | ||
ctypes.cast(id(f), ctypes.POINTER(FILE)).contents.name = name | ||
ctypes.cast(id(name), ctypes.POINTER(PyObject)).contents.ob_refcnt += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this works, but looks a bit fragile to me.
It now passes the test on python 2.7, and as it now doesn't use any new features, it ought to be backwards compatible. |
I'm ok with it as is for python2.7 and python2.6… (you could opt to use the 'opener' if it exists, if that makes things cleaner for 3.x). It needs an easy fix for python2.5:
It's also broken now for 3.1 and 3.2 (same small error)
Looks like 3.3 and 3.4 pass, but print to stdout, as you mentioned earlier. This bug will get fixed, and probably shouldn't be worried about it if it only occurs when a Traceback is thrown -- meaning, it never appears when the error is caught… so you only see a print resulting from the bug. Following up on the remaining points:
See my opinion from last week… should throw an error, not change
I'm good with current solution, I guess.
Yes, it's better to catch stdout. Fine if you leave it as a follow-up item. I have a code snippet that can be applied here fairly easily.
Honestly, I don't know. I typically look at what python would do, and just do that.
File behavior is slightly different on windows…. however, the solution does look portable. I'm ok with assuming it works. (Yikes) |
Fixed.
Also fixed.
Oops, that looks like a problem with #41. Shall I make a new PR, or shall we leave that to be fixed in #62 ?
Is that keep
Or fix #3 and use |
Whoops, looks like we've got a merge conflict now... 😯 Shall I try to remove 4edc3c0? |
I already made the change in test_file.py to be 2.5 compatible, as it was already accepted code. Merge the change into your PR. Make a new PR for the 3.1/3.2 patch, or I'll patch it. It should go straight in (instead of waiting in #62). I'd like to get this PR resolved, as it's really close. I don't want to have too many PR's outstanding for you to have to manage... If you like I am in the process of switching all my tests across the board to produce no output, and then further to use |
…w_file_handling Conflicts: tests/test_file.py
11bd4b3
to
46cbc00
Compare
OK, I think I've made it work now...
OK, switched.
That would be great, as then you could enable Travis, and so run all the tests on each version of python, for each PR, which would help with bugs and the like. The best I do at the moment is: for i in tests/*.py; do python $i; done
for i in tests/*.py; do python2 $i; done |
Implement changes in #57: improve behaviour of dumped files.
Some remaining cleanup to be noted in #57. |
Here's a start on #57.
Questions: