Skip to content

App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.

License

Notifications You must be signed in to change notification settings

guillim/Arachnida

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arachnida : simple web interface to pilot crawlers (Under Construction)

Scrape the web easily -> no need to be a coding expert. Arachnida is providing a simple web interface to pilot powerful crawlers (running Headless Chrome)

Install (2 seconds)

open a terminal, and run:

git clone https://github.com/guillim/Arachnida.git arachnida  && cd arachnida  && meteor

Finished !

Use (1 minute)

Now open google chrome (or any browser) and follow this link: http://localhost:3000

You will be able to add a crawler, configure it, and run it in seconds !

1. Create a crawler on the main page:

First give it a name, and leave the function empty (except if you know what you're doing) screenshot

2. Configure your crawler:

This is the only moment when a bit of coding knowledge is helpful. In the main part, you need to write a JavaScript function that will be executed on every page scraped by the crawler.

For instance, to extract the title of each page, write:

return {             
  title: $('title').text(),
};

Yes, jquery is already set up. You simply need to provide the selectors (id, class...)

screenshot

View the results:

screenshot

What's included

  • See screenshot of your running crawler
  • Manually add URL to be scraped, or upload a CSV
  • Sign in / Sign up
  • Account management: Profile Page, Username, Change password, Delete account...
  • Admin for the webmaster: go to /admin
  • Router
  • MongoDB as database

Contribute

I am looking for people to make pull requests to improve Arachnida. Please do it :)
TO DO:

  1. Setup live queue of url to be scraped (ex: at the moment, you can't click straight on a link and scrape it)
  2. Live Log from the server brought to the interface to help debugging
  3. Results export functionality (CSV & Json)

Thanks

Boilerplate: yogiben.
HeadlessChrome layer: yujiosaka

About

App to scrap the web, for people without coding skills. Fully integrates WebCrawlers (Headless Chrome) and the interface to deal with it.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published