Skip to content

A javaScript/node.js webscraper to down images and their info (summary, title, etc.) off of the Library of Congress' website. Also to store their image link and data in a .json file.

Notifications You must be signed in to change notification settings

Jmerc03/webScraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webScraping

About

A javaScript/node.js webscraper to pull images off of the Library of Congress' website. It creates a directory called photos and downloads photos there.

It also creates a info.json with all information about that photo. The h3 header ids are the keys and the lis are the values. When there are multiple lines they are stored in an array linked to the key. This happens often with notes.

Installation

Make sure you have Node.js installed. Then clone repo and run:

npm install axios cheerio fs request

Usage

Open project directory on ternimal and run:

node main.js

This will download 21 photos and store them in ./photos and store their info in a info.json.

To change what photos you want to download/scrape change the keywords array with the lccn numbers.

About

A javaScript/node.js webscraper to down images and their info (summary, title, etc.) off of the Library of Congress' website. Also to store their image link and data in a .json file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published