Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix automatic space insertion between entity references in ConstructingParser. #140

Closed
wants to merge 2 commits into from
Closed

Conversation

piyush-jaiswal
Copy link

@piyush-jaiswal piyush-jaiswal commented Apr 2, 2017

This PR addresses the issue pointed out in #107. The tests are the ones which were provided by @ashawley here. It would be great if this could be reviewed. Thanks!

@SethTisue
Copy link
Member

SethTisue commented Apr 2, 2017

(sorry to be annoying and nitpick, as this looks like a nice improvement, but can you make sure the code you've added is indented correctly?)

@piyush-jaiswal
Copy link
Author

Yeah sure, no worries!

@ashawley
Copy link
Member

ashawley commented Apr 5, 2017

The tests might be failing because they need to go in the XMLTest.scala in jvm/src/test/ instead of shared/src/test. The test suite was a bit of a mess, and then it got more confusing when cross compilation to scalaJS was contributed. My unit tests were before that happened. I predict the ConstructingParser is not ScalaJS-friendly.

Would you be willing to try moving the tests and see if they pass?

@piyush-jaiswal
Copy link
Author

Yeah, I was trying to figure out the problem and found out that java.io.File is not portable to Scala.js. Thanks for pointing out the solution. Since the tests pass now, are there any further issues here?

@ashawley
Copy link
Member

ashawley commented Apr 9, 2017

It might not be necessary to introduce Utility.checkNodeForEntityRef. You can probably just use isAtom, see http://www.scala-lang.org/api/current/scala-xml/scala/xml/Node.html#isAtom:Boolean

Would that work instead?

@piyush-jaiswal
Copy link
Author

Yeah, it is not necessary and does indeed work. Thanks!

// Checks if node when converted to string is an entity ref
def checkNodeForEntityRef(n: Node): Boolean = {
val st = n.toString
st(0) == '&' && st(st.length - 1) == ';'
Copy link
Member

@SethTisue SethTisue Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know that n.toString can never be empty?

also, trivial nitpick, but st.last is clearer than st(st.length - 1) imo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SethTisue I agree. I'm hoping we can just get rid of this definition in favor of isAtom. If not, then I was going to suggest using startsWith and endsWith to avoid the inevitable out of bounds errors here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can completely get rid of checkNodeForEntityRef, as a test seems to fail when using

if(!prev.isAtom && !x.isAtom) sb.append(' ')
instead of
if (!checkNodeForEntityRef(prev) && !checkNodeForEntityRef(x)) sb.append(' ')

1a7268b4-a67b-4ae7-8255-edf3af10abfb

I'm struggling at finding the alternatives to this definition.

// If 'txt' is just made up of one or more spaces
if (TextBuffer.fromString(txt).toText == Nil) {
// Check if the last node in 'ts' was an 'Atom' and the next node to be parsed is an entity or character ref
if(ts(ts.length - 1).isAtom && ch == '&')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: ts.last

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know that ts can never be empty here?

@piyush-jaiswal
Copy link
Author

Hi, it's been some time since my last activity. I took a look at #77 and found it somewhat relatable as it also involves whitespaces around text and entity refs. Hence, I've tried to fix the issue. Waiting for your review. Thanks!

@ashawley
Copy link
Member

ashawley commented May 8, 2017

I'm not sure if we can completely get rid of checkNodeForEntityRef, as a test seems to fail when using isAtom

Ok, bummer. Thanks for looking in to that. I really appreciate it.

I guess I'll need to come up with some more test cases to better understand how the ConstructingParser parses entities.

I kind of wonder if #73 and #77 and #107 are all related problems. Optimistically, we choose a path to fix all of them, but if we can't, then choose which one(s) we can fix.

I'll try to study @piyush-jaiswal's fix for #77. Thanks for looking in to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants