html-dom-parser

HTML to DOM parser that works on both the server (Node.js) and the client (browser):

HTMLDOMParser(string[, options])

The parser converts an HTML string to a JavaScript object that describes the DOM tree.

Example

import parse from 'html-dom-parser';

parse('<p>Hello, World!</p>');

Output

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [
      Text {
        type: 'text',
        parent: [Circular],
        prev: null,
        next: null,
        startIndex: null,
        endIndex: null,
        data: 'Hello, World!'
      }
    ],
    name: 'p',
    attribs: {}
  }
]

Replit | JSFiddle | Examples

Install

NPM:

npm install html-dom-parser --save

Yarn:

yarn add html-dom-parser

CDN:

<script src="https://unpkg.com/html-dom-parser@latest/dist/html-dom-parser.min.js"></script>
<script>
  window.HTMLDOMParser(/* string */);
</script>

Usage

Import with ES Modules:

import parse from 'html-dom-parser';

Require with CommonJS:

const parse = require('html-dom-parser').default;

Parse empty string:

parse('');

Output:

[]

Parse string:

parse('Hello, World!');

Output

[
  Text {
    type: 'text',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    data: 'Hello, World!'
  }
]

Parse element with attributes:

parse('<p class="foo" style="color: #bada55">Hello, <em>world</em>!</p>');

Output

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [ [Text], [Element], [Text] ],
    name: 'p',
    attribs: { class: 'foo', style: 'color: #bada55' }
  }
]

The server parser is a wrapper of htmlparser2 parseDOM but with the root parent node excluded. The next section shows the available options you can use with the server parse.

The client parser mimics the server parser by using the DOM API to parse the HTML string.

Options (server only)

Because the server parser is a wrapper of htmlparser2, which implements domhandler, you can alter how the server parser parses your code with the following options:

/**
 * These are the default options being used if you omit the optional options object.
 * htmlparser2 will use the same options object for its domhandler so the options
 * should be combined into a single object like so:
 */
const options = {
  /**
   * Options for the domhandler class.
   * https://github.com/fb55/domhandler/blob/master/src/index.ts#L16
   */
  withStartIndices: false,
  withEndIndices: false,
  xmlMode: false,
  /**
   * Options for the htmlparser2 class.
   * https://github.com/fb55/htmlparser2/blob/master/src/Parser.ts#L104
   */
  xmlMode: false, // Will overwrite what is used for the domhandler, otherwise inherited.
  decodeEntities: true,
  lowerCaseTags: true, // !xmlMode by default
  lowerCaseAttributeNames: true, // !xmlMode by default
  recognizeCDATA: false, // xmlMode by default
  recognizeSelfClosing: false, // xmlMode by default
  Tokenizer: Tokenizer,
};

If you're parsing SVG, you can set lowerCaseTags to true without having to enable xmlMode. This will return all tag names in camelCase and not the HTML standard of lowercase.

Note

If you're parsing code client-side (in-browser), you cannot control the parsing options. Client-side parsing automatically handles returning some HTML tags in camelCase, such as specific SVG elements, but returns all other tags lowercased according to the HTML standard.

Migration

v5

Migrated to TypeScript. CommonJS imports require the .default key:

const parse = require('html-dom-parser').default;

v4

Upgraded htmlparser2 to v9.

v3

Upgraded domhandler to v5. Parser options like normalizeWhitespace have been removed.

v2

Removed Internet Explorer (IE11) support.

v1

Upgraded domhandler to v4 and htmlparser2 to v6.

Release

Release and publish are automated by Release Please.

Special Thanks

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1,855 Commits
.github		.github
.husky		.husky
esm		esm
examples		examples
src		src
test		test
.commitlintrc.json		.commitlintrc.json
.gitignore		.gitignore
.lintstagedrc.json		.lintstagedrc.json
.mocharc.json		.mocharc.json
.nvmrc		.nvmrc
.prettierrc.json		.prettierrc.json
.size-limit.json		.size-limit.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
karma.conf.js		karma.conf.js
package-lock.json		package-lock.json
package.json		package.json
rollup.config.mjs		rollup.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

html-dom-parser

Example

Install

Usage

Options (server only)

Migration

v5

v4

v3

v2

v1

Release

Special Thanks

License

About

Uh oh!

Releases 51

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

Uh oh!

License

remarkablemark/html-dom-parser

Folders and files

Latest commit

History

Repository files navigation

html-dom-parser

Example

Install

Usage

Options (server only)

Migration

v5

v4

v3

v2

v1

Release

Special Thanks

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 51

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages