Skip to content

Quickstart

Albert Schimpf edited this page Feb 26, 2021 · 12 revisions

Quickstart

Quick Example - Github Stars

Extract stars from raw HTML data periodically.

name: ghstars
graphs:
  start:
    - { type: HttpRequest, url: "https://github.com/scraperflow/scraper", holdOnReservation: 5000, put: response }
    - type: Regex
      regex: "\\<a.*?stargazers.*?\\>\\s.*?(\\d)+" 
      content: "{response}"
      groups: { stars: 1 }
      output: output
    - { type: Echo, log: "Got {{{output}}[0]@stars} stars", goTo: start }

Used nodes: HttpRequest, RegexNode, EchoNode

Save this content into ghstars.yf file and execute scraper in a directory only containing ghstars.yf:

  docker run -v "$PWD":/rt/ --rm albsch/scraper:latest

or use the supplied run script with the modular jar release.

This job will crash once the API of Github changes. Furthermore, it crashes on unexpected environment effects (e.g. internet outage)

Possible improvements:

  • Use WriteLineToFile to archive data persistently
  • Use another HttpRequest to generate a call to another micro service
  • Use Periodic which uses dispatched arrows to not crash on non-API errors