Skip to content

Conversation

@JackStouffer
Copy link
Contributor

Removed unsafe pointer access and marked some functions as @safe in std.xml

std/xml.d Outdated

this(xml.tag);
prolog = s[0 .. tagString.ptr - s.ptr];
prolog = s[0 .. &tagString[0] - &s[0]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh... I'm not seeing why this is any safer than doing arithmetic on .ptr. It's still pointer arithmetic!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is, I'm surprised this makes a difference to the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the new safety checks that Walter added. What if tagString is empty? .ptr points past the end of the array. This way bounds checking is always done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this makes a difference to the compiler

It doesn't. I still can't mark this ctor as safe because it calls parse. This just makes the function safer than it was before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still don't get it. If tagString is empty, wouldn't it be illegal to take the address of its (nonexistent) first element? Or is the whole point to trigger a druntime bounds check upon taking the address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the whole point to trigger a druntime bounds check upon taking the address?

Yup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to the point, I see it as a code smell that the code should expect to be able to compute the difference of two pointers without verifying that the pointers are valid to begin with. It just happens to work, sure, but it's very error-prone and makes the logic hard to follow (and harder to verify). I'd much rather explicitly state in the code what happens when tagString is empty (e.g., if (tagString.empty) ... than to try to hack a slice expression that just so happens to produce the right results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference between 2 pointers is valid in @safe code. The result is a ptrdiff_t, not a pointer.

However, I'd prefer to see something with lengths vs using pointer differences. This unnecessarily checks for valid indexes when we don't actually care about the data inside, we are just using it to slice. Don't know enough about the code to make an informed recommendation. This may be one spot where you pull out a @trusted escape.


immutable(char)* p = startTag.tagString.ptr
immutable(char)* p = &startTag.tagString[0]
+ startTag.tagString.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing has to be rethought. You can't do pointer arithmetic here and be @safe. Just changing the one aspect gets us no closer.

What you might need is &tag_.tagString[0] - &startTag.tagString[0] + startTag.tagString.length instead of q-p.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God, this code is so ugly...

Copy link
Contributor Author

@JackStouffer JackStouffer May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do pointer arithmetic here and be @safe

Well, this whole function isn't marked as @safe, I'm just trying to make it more safe than it was before.

God, this code is so ugly...

Here's a small clean-up #5380

std/xml.d Outdated

this(xml.tag);
prolog = s[0 .. tagString.ptr - s.ptr];
prolog = s[0 .. &tagString[0] - &s[0]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference between 2 pointers is valid in @safe code. The result is a ptrdiff_t, not a pointer.

However, I'd prefer to see something with lengths vs using pointer differences. This unnecessarily checks for valid indexes when we don't actually care about the data inside, we are just using it to slice. Don't know enough about the code to make an informed recommendation. This may be one spot where you pull out a @trusted escape.

{
import std.encoding : transcode;
import std.string : count, lastIndexOf;
import std.utf : toUTF32;
Copy link
Member

@schveiguy schveiguy May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problems here, just have to say YAY about getting rid of references to std.encoding.

@JackStouffer JackStouffer force-pushed the xml-safe branch 3 times, most recently from 03b5b13 to a9f8737 Compare May 11, 2017 18:15
@JackStouffer
Copy link
Contributor Author

Removed the first change to the pointers as it's unnecessary.

@JackStouffer
Copy link
Contributor Author

Ping @schveiguy

schveiguy
schveiguy previously approved these changes May 17, 2017
@schveiguy
Copy link
Member

LGTM. @quickfur any more comments?

std/xml.d Outdated
immutable(char)* q = tag_.tagString.ptr;
text = decode(p[0..(q-p)], DecodeMode.LOOSE);
immutable end = &tag_.tagString[0] - (&startTag.tagString[0] + startTag.tagString.length);
text = decode(p[0 .. end], DecodeMode.LOOSE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not like this piece of code (both the original version and the revised version). In the original code, the construct p[0 .. (q-p)] implies that p must be smaller than q, but p points to the end of startTag.tagString. So q must lie outside of startTag.tagString. And then it takes a slice from p to q, which is a slice that lies outside of startTag.tagString. The only way this can even remotely be valid is that q points to a later part of the same input as startTag.tagString is a substring of. While this is probably true, this makes this code very fragile and highly-dependent on the behaviour of the surrounding code. I'm not surprised the compiler is not happy with it.

However, I am surprised that the compiler is OK with the revised version of the code, that essentially does the same thing. How it can decide that the revised code is @safe is beyond me, unless it's a loophole in the compiler. Note that there is no bounds check on the slice p[0 .. end] because p is a raw pointer. We don't even know (locally) that tag_.tagString and startTag.tagString are even remotely related to each other, so any bounds check introduced by indexing them with 0 does not, AFAICT, guarantee anything at all. IOW this code is still (locally) unsafe. Unless the compiler is doing aliasing analysis on the entire function to guarantee that the slices are valid, but I have my doubts, because in that case, the original code should have been accepted too. So I'm not sure how/why the revised code makes a difference, and I'm uncomfortable about the compiler accepting the new code in spite of the semantics being essentially identical (i.e., taking what amounts to an out-of-bounds slice of startTag.tagString).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we split this PR into two parts, one with the uncontroversial changes that can be merged ASAP, and another to rework this ugly bit of code to eliminate the pointer arithmetic / out-of-bounds slicing in a satisfactory way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my. I didn't notice that p was a pointer still.

Yeah, I feel this change is half way, but gives us no better results. In fact it gives us less performance, as we are now doing bounds checking on the first element. And we still can't call this function @safe.

I'm now looking to see where s is defined. My god, it's a member variable string pointer. Who calls their members s?

@JackStouffer I know you are trying to move this into a @safe direction, but this code may be unfixable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's certainly ugly, but I've had to fix uglier code before. :-) I think it's possible, but it's probably better to split this PR and merge the simple parts first, then work on the major surgery this ugly piece of code's gonna need to get to a sane state.

@schveiguy schveiguy dismissed their stale review May 18, 2017 00:06

This doesn't get us to safety, and we may never get there. Better leave it as is.

@schveiguy
Copy link
Member

I'm now thinking, this may be a lost cause. Sorry, @JackStouffer.

@quickfur
Copy link
Member

Well, I think we can salvage this. At least split off the non-controversial parts into a separate PR and merge that: that's still better than nothing at all.

As for making the present code safe, I think it will require some major surgery. I don't think it's impossible, just that it will take more effort and may require some disruptive changes.

@JackStouffer
Copy link
Contributor Author

How it can decide that the revised code is @safe is beyond me

As I've said before, these changes in no way make or are meant to make parse @safe. The original changes were meant to make the function slightly more safe than it was by forcing bounds checking. It's not perfect, but it was an improvement on the past because it guarantees that tagString is non-empty while having no runtime cost in no bounds check mode.

I'll remove the pointer changes and just add a check to make sure tagString is not empty

@JackStouffer
Copy link
Contributor Author

@schveiguy @quickfur Updated

@andralex
Copy link
Member

If I just add auto-merge, does that count as approved?

@wilzbach
Copy link
Contributor

If I just add auto-merge, does that count as approved?

Nope.

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then i approve!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants