Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make assert_dom_equal ignore insignificant whitespace when walking the node tree #84

Merged
merged 2 commits into from
Jul 1, 2020

Conversation

jduff
Copy link
Contributor

@jduff jduff commented May 20, 2020

Yet another take on fixing #62

Prior art: #71, #66 and #83

This version ignores the whitespace as we are walking the tree instead of trying to pre-process or clean the html strings. Also included a strict option to preserve the assertion including whitespace (default to false).

I'm not a fan of passing strict through the method calls but wasn't sure of a better approach.

Let me know what you think and if there are any changes I should make!

child.to_s == other_child.to_s
else
child.to_s.strip == other_child.to_s.strip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of / in addition to strip, I think we need some kind of regex that collapses interior white spaces as well. For example,

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p><p>Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.</p>

vs

<p>
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
  quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
  consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
  cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
  proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>

<p>
  Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius,
  turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis
  sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac
  tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam
  tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id
  tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec
  fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec,
  commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum
  aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat.
  Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur
  augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl,
  mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.
</p>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test case for this here and solved it by using split instead of strip on both strings.

def compare_doms(expected, actual, strict)
expected_children = extract_children(expected, strict)
actual_children = extract_children(actual, strict)
return false unless expected_children.size == actual_children.size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that there can't be adjacent text nodes, which is generally not a correct assumption. I think in our use case it's probably a reasonable one, but I am not familiar with the guarantees of the Nokogiri parsing algorithm to know for sure. There may also be ways for us to cause this to happen reject though I could not think of an example.

I don't think there is a strong reason to change this absent of a failing test case, but just thought I should point this out explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding what you're referring to, adjacent text nodes should be fine, there just needs to be the same number of nodes.

I suppose it would be possible for "one two three" to be split into 1-3 text nodes, but I doubt that would happen for the same strings in the same process. If we do end up with a different number of text nodes I think there would have to be differences in the markup to cause it and failing would be correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I meant. I'm not sure if we can guarantee that Nokogiri won't split text nodes, or that our code wouldn't cause it to do that. For example, if we decide to ignore HTML comments, then this would not work, because the comment may have splitter one text node into two. But can find out and iterate on that another time.

@chancancode
Copy link
Member

I think this is a pretty solid attempt and the best one we've had so far. I think once we fix the interior whitespace issue, it seems good to land this and iterate further when there are bug reports. Thanks for working on this!

@jduff
Copy link
Contributor Author

jduff commented Jun 18, 2020

@chancancode thanks for taking a look! I just updated this PR with a test and fix for the issue with interior whitespace you mentioned. If there is anything else let me know.

@chancancode
Copy link
Member

Urgh, seems like CI config is super old and 1.9 does not support kwarg. You can either change it to take a hash there to avoid the syntax error, or you can fix the CI config to drop support for EOL'ed Rubies and Rails, and add matrix rows for newer Rubies + Rails, in which case we can cut a breaking change release after landing this.

@jduff
Copy link
Contributor Author

jduff commented Jun 18, 2020

@chancancode instead of lumping it all together I created another PR updating the test matrix #86

@kaspth
Copy link
Contributor

kaspth commented Jun 18, 2020

I don't have much overhead for intricate reviews right now. @chancancode if you want to help carry this, I'll happily merge 🙏

@chancancode
Copy link
Member

I think this is basically good to go if we merge the CI fix and rebase this on top. We will have to make a new breaking release after due to dropping old ruby versions.

@chancancode
Copy link
Member

However, I apparently can’t merge 😛

@jduff jduff force-pushed the ignore_whitespace_2 branch from 5f3f093 to 8b074b9 Compare June 18, 2020 21:45
@jduff
Copy link
Contributor Author

jduff commented Jun 18, 2020

I just pushed up a rebase!

@chancancode
Copy link
Member

@kaspth This seems good to me. If you can look into getting my commit/publish bit back then I can help with the merge/release as well, I assume that got lost in shuffling the GH teams since I still have the rails/rails one, and this seems much lower stakes than that. I also don't mind not having it if you want to do the merge/publish.

@jduff Since we stopped testing Rails 4.2 on CI, we should probably bump this to require the versions we are testing. It doesn't have to be part of this PR, but it does need to be changed before we release.

@kaspth
Copy link
Contributor

kaspth commented Jun 30, 2020

@chancancode done, appreciate the help 👍 — and thank you for the PR, @jduff 🙌

@chancancode chancancode merged commit 47506c4 into rails:master Jul 1, 2020
@jduff
Copy link
Contributor Author

jduff commented Jul 8, 2020

Thanks for helping see this through! I've been using master in one of my projects and it's working great 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants