-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make assert_dom_equal ignore insignificant whitespace when walking the node tree #84
Conversation
child.to_s == other_child.to_s | ||
else | ||
child.to_s.strip == other_child.to_s.strip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of / in addition to strip
, I think we need some kind of regex that collapses interior white spaces as well. For example,
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p><p>Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.</p>
vs
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p>
Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius,
turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis
sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac
tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam
tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id
tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec
fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec,
commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum
aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat.
Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur
augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl,
mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.
</p>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test case for this here and solved it by using split
instead of strip
on both strings.
def compare_doms(expected, actual, strict) | ||
expected_children = extract_children(expected, strict) | ||
actual_children = extract_children(actual, strict) | ||
return false unless expected_children.size == actual_children.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that there can't be adjacent text nodes, which is generally not a correct assumption. I think in our use case it's probably a reasonable one, but I am not familiar with the guarantees of the Nokogiri parsing algorithm to know for sure. There may also be ways for us to cause this to happen reject
though I could not think of an example.
I don't think there is a strong reason to change this absent of a failing test case, but just thought I should point this out explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm misunderstanding what you're referring to, adjacent text nodes should be fine, there just needs to be the same number of nodes.
I suppose it would be possible for "one two three" to be split into 1-3 text nodes, but I doubt that would happen for the same strings in the same process. If we do end up with a different number of text nodes I think there would have to be differences in the markup to cause it and failing would be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I meant. I'm not sure if we can guarantee that Nokogiri won't split text nodes, or that our code wouldn't cause it to do that. For example, if we decide to ignore HTML comments, then this would not work, because the comment may have splitter one text node into two. But can find out and iterate on that another time.
I think this is a pretty solid attempt and the best one we've had so far. I think once we fix the interior whitespace issue, it seems good to land this and iterate further when there are bug reports. Thanks for working on this! |
@chancancode thanks for taking a look! I just updated this PR with a test and fix for the issue with interior whitespace you mentioned. If there is anything else let me know. |
Urgh, seems like CI config is super old and 1.9 does not support kwarg. You can either change it to take a hash there to avoid the syntax error, or you can fix the CI config to drop support for EOL'ed Rubies and Rails, and add matrix rows for newer Rubies + Rails, in which case we can cut a breaking change release after landing this. |
@chancancode instead of lumping it all together I created another PR updating the test matrix #86 |
I don't have much overhead for intricate reviews right now. @chancancode if you want to help carry this, I'll happily merge 🙏 |
I think this is basically good to go if we merge the CI fix and rebase this on top. We will have to make a new breaking release after due to dropping old ruby versions. |
However, I apparently can’t merge 😛 |
5f3f093
to
8b074b9
Compare
I just pushed up a rebase! |
@kaspth This seems good to me. If you can look into getting my commit/publish bit back then I can help with the merge/release as well, I assume that got lost in shuffling the GH teams since I still have the rails/rails one, and this seems much lower stakes than that. I also don't mind not having it if you want to do the merge/publish. @jduff Since we stopped testing Rails 4.2 on CI, we should probably bump this to require the versions we are testing. It doesn't have to be part of this PR, but it does need to be changed before we release. |
@chancancode done, appreciate the help 👍 — and thank you for the PR, @jduff 🙌 |
Thanks for helping see this through! I've been using master in one of my projects and it's working great 👌 |
Yet another take on fixing #62
Prior art: #71, #66 and #83
This version ignores the whitespace as we are walking the tree instead of trying to pre-process or clean the html strings. Also included a
strict
option to preserve the assertion including whitespace (default tofalse
).I'm not a fan of passing
strict
through the method calls but wasn't sure of a better approach.Let me know what you think and if there are any changes I should make!