This repository contains the source code and input files used to write the official tutorials of the univocity-html-parser
This is the result of 5 years of research and development to make working with HTML as effortless as possible.
Some of the things it can do:
-
Handles the trickiest, messiest and most horrible HTML cleanly
-
Join pieces of data scattered in multiple pages with zero effort
-
Built-in support for historical data organization and re-parsing
-
Handles pagination and linked pages automatically
-
Download page resources better than your browser
-
Helps you to identify new elements in updated pages
Read through the tutorials and execute the examples here to try it out!
The API is fully documented and the javadocs are here.
We hope you enjoy!
The univocity team.