-
Notifications
You must be signed in to change notification settings - Fork 2
Scraper Formats
There are currently 3 core parsers implemented for Scraper specifications.
The jf/json
parser and the yf/yaml/yml
parser are identical. The json representation can be converted into the yml representation with external tools without problems and vice versa.
For simplicity we only cover the yml
format.
A full yml specification:
name: String
entry: String # default: "start"
globalNodeConfigurations: # default: {}
String: String # key: Static String or Regex enclosed in '/ /' (e.g. "/ech.*/")
# value: value to be used in matching nodes
imports: # default: {}
String: # key: full path to imported taskflow (e.g. 'child.yf'). Should be the same format.
# value: not used
graphs: # mandatory map of String -> List of NodeSpec
String: [NodeSpec]
Where a NodeSpec
is a key-value map of an implemented node (see Node Documentation).
For one-off taskflows, a more simple specification parser is provided which assumes no imports are needed. The format is as follows:
String: [NodeSpec]
Where a NodeSpec
is a key-value map of an implemented node (see Node Documentation).
Basically, a map where keys are graphs and the values are list of nodes.
The name
key is by convention the filename without the file ending.
Your custom parser should implement the ScraperSpecificationParser
interface.
It should provide the file endings it accepts and should transform all valid custom formats to valid internal ScrapeSpecification
s.
See the three core parsers for reference.