-
-
Notifications
You must be signed in to change notification settings - Fork 746
More safe fixes for std.xml #5382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
std/xml.d
Outdated
|
|
||
| this(xml.tag); | ||
| prolog = s[0 .. tagString.ptr - s.ptr]; | ||
| prolog = s[0 .. &tagString[0] - &s[0]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh... I'm not seeing why this is any safer than doing arithmetic on .ptr. It's still pointer arithmetic!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is, I'm surprised this makes a difference to the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the new safety checks that Walter added. What if tagString is empty? .ptr points past the end of the array. This way bounds checking is always done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this makes a difference to the compiler
It doesn't. I still can't mark this ctor as safe because it calls parse. This just makes the function safer than it was before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still don't get it. If tagString is empty, wouldn't it be illegal to take the address of its (nonexistent) first element? Or is the whole point to trigger a druntime bounds check upon taking the address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is the whole point to trigger a druntime bounds check upon taking the address?
Yup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More to the point, I see it as a code smell that the code should expect to be able to compute the difference of two pointers without verifying that the pointers are valid to begin with. It just happens to work, sure, but it's very error-prone and makes the logic hard to follow (and harder to verify). I'd much rather explicitly state in the code what happens when tagString is empty (e.g., if (tagString.empty) ... than to try to hack a slice expression that just so happens to produce the right results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difference between 2 pointers is valid in @safe code. The result is a ptrdiff_t, not a pointer.
However, I'd prefer to see something with lengths vs using pointer differences. This unnecessarily checks for valid indexes when we don't actually care about the data inside, we are just using it to slice. Don't know enough about the code to make an informed recommendation. This may be one spot where you pull out a @trusted escape.
|
|
||
| immutable(char)* p = startTag.tagString.ptr | ||
| immutable(char)* p = &startTag.tagString[0] | ||
| + startTag.tagString.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole thing has to be rethought. You can't do pointer arithmetic here and be @safe. Just changing the one aspect gets us no closer.
What you might need is &tag_.tagString[0] - &startTag.tagString[0] + startTag.tagString.length instead of q-p.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
God, this code is so ugly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't do pointer arithmetic here and be
@safe
Well, this whole function isn't marked as @safe, I'm just trying to make it more safe than it was before.
God, this code is so ugly...
Here's a small clean-up #5380
std/xml.d
Outdated
|
|
||
| this(xml.tag); | ||
| prolog = s[0 .. tagString.ptr - s.ptr]; | ||
| prolog = s[0 .. &tagString[0] - &s[0]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difference between 2 pointers is valid in @safe code. The result is a ptrdiff_t, not a pointer.
However, I'd prefer to see something with lengths vs using pointer differences. This unnecessarily checks for valid indexes when we don't actually care about the data inside, we are just using it to slice. Don't know enough about the code to make an informed recommendation. This may be one spot where you pull out a @trusted escape.
| { | ||
| import std.encoding : transcode; | ||
| import std.string : count, lastIndexOf; | ||
| import std.utf : toUTF32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problems here, just have to say YAY about getting rid of references to std.encoding.
03b5b13 to
a9f8737
Compare
|
Removed the first change to the pointers as it's unnecessary. |
|
Ping @schveiguy |
|
LGTM. @quickfur any more comments? |
std/xml.d
Outdated
| immutable(char)* q = tag_.tagString.ptr; | ||
| text = decode(p[0..(q-p)], DecodeMode.LOOSE); | ||
| immutable end = &tag_.tagString[0] - (&startTag.tagString[0] + startTag.tagString.length); | ||
| text = decode(p[0 .. end], DecodeMode.LOOSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really do not like this piece of code (both the original version and the revised version). In the original code, the construct p[0 .. (q-p)] implies that p must be smaller than q, but p points to the end of startTag.tagString. So q must lie outside of startTag.tagString. And then it takes a slice from p to q, which is a slice that lies outside of startTag.tagString. The only way this can even remotely be valid is that q points to a later part of the same input as startTag.tagString is a substring of. While this is probably true, this makes this code very fragile and highly-dependent on the behaviour of the surrounding code. I'm not surprised the compiler is not happy with it.
However, I am surprised that the compiler is OK with the revised version of the code, that essentially does the same thing. How it can decide that the revised code is @safe is beyond me, unless it's a loophole in the compiler. Note that there is no bounds check on the slice p[0 .. end] because p is a raw pointer. We don't even know (locally) that tag_.tagString and startTag.tagString are even remotely related to each other, so any bounds check introduced by indexing them with 0 does not, AFAICT, guarantee anything at all. IOW this code is still (locally) unsafe. Unless the compiler is doing aliasing analysis on the entire function to guarantee that the slices are valid, but I have my doubts, because in that case, the original code should have been accepted too. So I'm not sure how/why the revised code makes a difference, and I'm uncomfortable about the compiler accepting the new code in spite of the semantics being essentially identical (i.e., taking what amounts to an out-of-bounds slice of startTag.tagString).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we split this PR into two parts, one with the uncontroversial changes that can be merged ASAP, and another to rework this ugly bit of code to eliminate the pointer arithmetic / out-of-bounds slicing in a satisfactory way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my. I didn't notice that p was a pointer still.
Yeah, I feel this change is half way, but gives us no better results. In fact it gives us less performance, as we are now doing bounds checking on the first element. And we still can't call this function @safe.
I'm now looking to see where s is defined. My god, it's a member variable string pointer. Who calls their members s?
@JackStouffer I know you are trying to move this into a @safe direction, but this code may be unfixable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's certainly ugly, but I've had to fix uglier code before. :-) I think it's possible, but it's probably better to split this PR and merge the simple parts first, then work on the major surgery this ugly piece of code's gonna need to get to a sane state.
This doesn't get us to safety, and we may never get there. Better leave it as is.
|
I'm now thinking, this may be a lost cause. Sorry, @JackStouffer. |
|
Well, I think we can salvage this. At least split off the non-controversial parts into a separate PR and merge that: that's still better than nothing at all. As for making the present code safe, I think it will require some major surgery. I don't think it's impossible, just that it will take more effort and may require some disruptive changes. |
As I've said before, these changes in no way make or are meant to make I'll remove the pointer changes and just add a check to make sure |
|
@schveiguy @quickfur Updated |
|
If I just add auto-merge, does that count as approved? |
Nope. |
andralex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then i approve!
Removed unsafe pointer access and marked some functions as @safe in std.xml