Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting location info of a node #1084

Closed
shalithasuranga opened this issue Sep 20, 2017 · 3 comments
Closed

Getting location info of a node #1084

shalithasuranga opened this issue Sep 20, 2017 · 3 comments

Comments

@shalithasuranga
Copy link

Can we take line & column details of a specific node in cheerio? If not, is that feature possible to add with existing html parser?

Thanks

@shalithasuranga
Copy link
Author

Any update regarding this? @matthewmueller @jugglinmike

@trevorhreed
Copy link
Contributor

trevorhreed commented Mar 1, 2018

Setting the option xmlMode to true will direct Cheerio to use htmlparser2. With that set, you can set the withStartIndices to true as well and then each node will have a startIndex property attached to them. However, beware that using Cheerio with xmlMode set to true may break some use cases. For example, I am using Cheerio to pull JavaScript code out of a <script> tag as part of a build process. With xmlMode set to true, the code gets mangled (html character encoding, etc).

After doing some digging, the new, default parser used under the hood (parse5) also has an option for adding location information to nodes (appropriately named locationInfo). Unfortunately, Cheerio doesn't yet support passing this option through. Perhaps, if I get some time, I'll create a PR.

[UPDATE]

Here's a hacky workaround. Hijack parse5 before you load cheerio, thereby forcing parse5 to attach location information when cheerio uses it. Note: you must do the hijacking before you load cheerio! If you load cheerio in several files in your project, make sure this hack comes before the first occurrence of require('cheerio') in the dependency graph.

const parse5 = require('parse5')
// Cheerio doesn't allow us to pass the `locationInfo` option on to parse5, so we're hijacking it before we load cheerio
  const origParse = parse5.parse
  parse5.parse = (html, opts) => {
    opts.locationInfo = true
    return origParse(html, opts)
  }
  const origParseFrag = parse5.parseFragment
  parse5.parseFragment = (html, opts) => {
    opts.locationInfo = true
    return origParseFrag(html, opts)
  }
  // END OF HIJACK
const cheerio = require('cheerio')

@fb55
Copy link
Member

fb55 commented Mar 9, 2018

Fixed in #1155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants