Arachnida : simple web interface to pilot crawlers (Under Construction)

Scrape the web easily -> no need to be a coding expert. Arachnida is providing a simple web interface to pilot powerful crawlers (running Headless Chrome)

Install (2 seconds)

open a terminal, and run:

git clone https://github.com/guillim/Arachnida.git arachnida  && cd arachnida  && meteor

Finished !

Use (1 minute)

Now open google chrome (or any browser) and follow this link: http://localhost:3000

You will be able to add a crawler, configure it, and run it in seconds !

1. Create a crawler on the main page:

First give it a name, and leave the function empty (except if you know what you're doing)

2. Configure your crawler:

This is the only moment when a bit of coding knowledge is helpful. In the main part, you need to write a JavaScript function that will be executed on every page scraped by the crawler.

For instance, to extract the title of each page, write:

return {             
  title: $('title').text(),
};

Yes, jquery is already set up. You simply need to provide the selectors (id, class...)

View the results:

What's included

See screenshot of your running crawler
Manually add URL to be scraped, or upload a CSV
Sign in / Sign up
Account management: Profile Page, Username, Change password, Delete account...
Admin for the webmaster: go to /admin
Router
MongoDB as database

Contribute

I am looking for people to make pull requests to improve Arachnida. Please do it :)
TO DO:

Setup live queue of url to be scraped (ex: at the moment, you can't click straight on a link and scrape it)
Live Log from the server brought to the interface to help debugging
Results export functionality (CSV & Json)

Thanks

Boilerplate: yogiben.
HeadlessChrome layer: yujiosaka

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.meteor		.meteor
client		client
collections		collections
i18n		i18n
lib		lib
private		private
public		public
server		server
.gitignore		.gitignore
.gitkeep		.gitkeep
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arachnida : simple web interface to pilot crawlers (Under Construction)

Install (2 seconds)

Use (1 minute)

1. Create a crawler on the main page:

2. Configure your crawler:

View the results:

What's included

Contribute

Thanks

About

Releases

Packages

Contributors 5

Languages

License

guillim/Arachnida

Folders and files

Latest commit

History

Repository files navigation

Arachnida : simple web interface to pilot crawlers (Under Construction)

Install (2 seconds)

Use (1 minute)

1. Create a crawler on the main page:

2. Configure your crawler:

View the results:

What's included

Contribute

Thanks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages