Parsley

This tool accepts a list of URL's from a file, downloads each one, parses, and converts the HTML to JSON format.

Run it yourself

Clone this repo and install its dependencies;

Run python main.py with no args to test it;

It will download the content of the three URLs listed on the sample-links.txt file;

A JSON for each URL will be saved on the ./data/ folder with a sanitized version of URL as its filename.

Example

Original HTML:

<title>Buy Historical Stock Market Analytics JSON API | Stock Data API</title>
<meta name="description" content="Historical stock data JSON REST API for financial market data. Includes over
6,000 companies and more than 50 advanced technical indicators.">

Parsed JSON:

{
  "tags": [
    {
      "attributes": null,
      "content": "Buy Historical Stock Market Analytics JSON API | Stock Data API",
      "name": "title"
    },
    {
      "attributes": {
        "content": "Historical stock data JSON REST API for financial market data. Includes over 6,000 companies
         and more than 50 advanced technical indicators.",
        "name": "description"
      },
      "content": null,
      "name": "meta"
    }
  ]
}

Args

usage: main.py [-h] [--input INPUT] [--output_dir OUTPUT_DIR]
               [--workers WORKERS]

Gets a list of URLs and converts the HTML to JSON.

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         Input file
  --output_dir OUTPUT_DIR
                        Output directory
  --workers WORKERS     Number of threads

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
http_parser		http_parser
models		models
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
README.md		README.md
args_parser.py		args_parser.py
main.py		main.py
requirements.txt		requirements.txt
sample-links.txt		sample-links.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parsley

Run it yourself

Example

Args

About

Releases

Packages

Languages

cardoso-neto/Parsley

Folders and files

Latest commit

History

Repository files navigation

Parsley

Run it yourself

Example

Args

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages