-
Notifications
You must be signed in to change notification settings - Fork 2
Quickstart
Albert Schimpf edited this page Feb 26, 2021
·
12 revisions
- Use the latest modular jar release or the docker image
albsch/scraper:latest
- Download desired plugin releases and additional node implementations
- Use the documentation of the official nodes for usage
Extract stars from raw HTML data periodically.
name: ghstars
graphs:
start:
- { type: HttpRequest, url: "https://github.com/scraperflow/scraper", holdOnReservation: 5000, put: response }
- type: Regex
regex: "\\<a.*?stargazers.*?\\>\\s.*?(\\d)+"
content: "{response}"
groups: { stars: 1 }
output: output
- { type: Echo, log: "Got {{{output}}[0]@stars} stars", goTo: start }
Used nodes: HttpRequest, RegexNode, EchoNode
Save this content into ghstars.yf
file and execute scraper in a directory only containing ghstars.yf
:
docker run -v "$PWD":/rt/ --rm albsch/scraper:latest
or use the supplied run script with the modular jar release.
This job will crash once the API of Github changes. Furthermore, it crashes on unexpected environment effects (e.g. internet outage)
Possible improvements:
- Use WriteLineToFile to archive data persistently
- Use another HttpRequest to generate a call to another micro service
- Use Periodic which uses
dispatched
arrows to not crash on non-API errors