Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uppercase/mixedcase tag support for walk() function #6

Closed
wants to merge 3 commits into from

Conversation

benjaminma
Copy link
Contributor

Lowercasing encountered element names to walk <A>, <P>, <TaBLe>, etc. in walk()'s tag switch block

@mlegenhausen
Copy link
Member

Thanks for the pull request. I will take a look.

@benjaminma
Copy link
Contributor Author

Ah, I see. What do you think about terminating mailto: anchors and ignoring the inner text, but walking children of any non-mailto: anchor? Is the href necessary to output in those cases?

e.g. <a href="http://www.google.com">Google</a> or
<a href="#more-something"><img src="something-thumb.jpg"><div class="caption"><span>Something caption...</span></div></a>

My use case is to extract as much available text from an html doc.

@benjaminma
Copy link
Contributor Author

I think I intended to send the PR from a feature branch. Let me see if I can split up these commits into a separate issue.

@benjaminma
Copy link
Contributor Author

Split request. See #7 and #8.

@benjaminma benjaminma closed this Jul 11, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants