Skip to content

PxyUp/fitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fitter - new way for collect information from the API's/Websites

Fitter CLI - small cli command which provide result from Fitter for test/debug/home usage

Fitter Lib - library which provide functional of fitter CLI as a library

Way to collect information

  1. Server - parsing response from some API's or http request(usage of http.Client)
  2. Browser - emulate real browser using chromium + docker + playwright/cypress and get DOM information
  3. Static - parsing static string as data

Format which can be parsed

  1. JSON - parsing JSON to get specific information
  2. XML - parsing xml tree to get specific information
  3. HTML - parsing dom tree to get specific information
  4. XPath - parsing dom tree to get specific information but by xpath

Use like a library

go get github.com/PxyUp/fitter
package main

import (
	"fmt"
	"github.com/PxyUp/fitter/lib"
	"github.com/PxyUp/fitter/pkg/config"
	"log"
	"net/http"
)

func main() {
	res, err := lib.Parse(&config.Item{
		ConnectorConfig: &config.ConnectorConfig{
			ResponseType:  config.Json,
			Url:           "https://random-data-api.com/api/appliance/random_appliance",
			ServerConfig: &config.ServerConnectorConfig{
				Method: http.MethodGet,
			},
		},
		Model: &config.Model{
			ObjectConfig: &config.ObjectConfig{
				Fields: map[string]*config.Field{
					"my_id": {
						BaseField: &config.BaseField{
							Type: config.Int,
							Path: "id",
						},
					},
					"generated_id": {
						BaseField: &config.BaseField{
							Generated: &config.GeneratedFieldConfig{
								UUID: &config.UUIDGeneratedFieldConfig{},
							},
						},
					},
					"generated_array": {
						ArrayConfig: &config.ArrayConfig{
							RootPath: "@this|@keys",
							ItemConfig: &config.ObjectConfig{
								Field: &config.BaseField{
									Type: config.String,
								},
							},
						},
					},
				},
			},
		},
	}, nil, nil, nil, nil)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(res.ToJson())
}

Output:

{
  "generated_array": ["id","uid","brand","equipment"],
  "my_id": 6000,
  "generated_id": "26b08b73-2f2e-444d-bcf2-dac77ac3130e"
}

How to use Fitter

Download latest version from the release page

or locally:

go run cmd/fitter/main.go --path=./examples/config_api.json

Arguments

  1. --path - string[""] - path for the configuration of the Fitter
  2. --url - string[""] - url for the configuration of the Fitter
  3. --verbose - bool[false] - enable logging
  4. --plugins - string[""] - path for plugins for Fitter
  5. --log-level - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)

How to use Fitter_CLI

Download latest version from the release page

or locally:

go run cmd/cli/main.go --path=./examples/cli/config_cli.json

Arguments

  1. --path - string[""] - path for the configuration of the Fitter_CLI
  2. --url - string[""] - url for the configuration of the Fitter_CLI
  3. --copy - bool[false] - copy information into clipboard
  4. --pretty - bool[true] - make readable result(also affect on copy)
  5. --verbose - bool[false] - enable logging
  6. --omit-error-pretty - bool[false] - Provide pure value if pretty is invalid
  7. --plugins - string[""] - path for plugins for Fitter
  8. --log-level - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)
  9. --input - string[""] - specify input value for formatting. Examples: --input=\""124"\" --input=124 --input='{"test": 5}'
./fitter_cli_${VERSION} --path=./examples/cli/config_cli.json --copy=true

Examples:

  1. Server version HackerNews + Quotes + Guardian News - using API + HTML + XPath parsing
  2. Chromium version Guardian News + Quotes - using HTML parsing + browser emulation
  3. Docker version Docker version: Guardian News + Quotes - using HTML parsing + browser from Docker image
  4. Playwright version Playwright version: Guardian News + Quotes - using HTML parsing + browser from Playwright framework
  5. Playwright version Playwright version: England Cities + Weather - using HTML + XPath parsing + browser from Playwright framework
  6. JSON version Generate pagination - using static connector for generate pagination array
  7. Server version Get current time - get time from url and format it

Configuration

Connector

It is the way how you fetch the data

type ConnectorConfig struct {
    ResponseType ParserType `json:"response_type" yaml:"response_type"`
    Url          string     `json:"url" yaml:"url"`
    Attempts     uint32     `json:"attempts" yaml:"attempts"`
    
    NullOnError bool `yaml:"null_on_error" json:"null_on_error"`
    
    StaticConfig          *StaticConnectorConfig      `json:"static_config" yaml:"static_config"`
    IntSequenceConfig     *IntSequenceConnectorConfig `json:"int_sequence_config" yaml:"int_sequence_config"`
    ServerConfig          *ServerConnectorConfig      `json:"server_config" yaml:"server_config"`
    BrowserConfig         *BrowserConnectorConfig     `yaml:"browser_config" json:"browser_config"`
    PluginConnectorConfig *PluginConnectorConfig      `json:"plugin_connector_config" yaml:"plugin_connector_config"`
    ReferenceConfig       *ReferenceConnectorConfig   `yaml:"reference_config" json:"reference_config"`
    FileConfig            *FileConnectorConfig        `json:"file_config" yaml:"file_config"`
}
  • NullOnError[false] - if set to true then all errors a ignored
  • ResponseType - enum["HTML", "json","xpath"] - in which format data comes from the connector
  • Attempts - how many attempts to use for fetch data by connector
  • Url - define which address to request. Important: can be with inject of the parent value as a string https://api.open-meteo.com/v1/forecast?latitude={{{latitude}}}&longitude={{{longitude}}}&hourly=temperature_2m&forecast_days=1

Config can be one of:

Example:

{
  "response_type": "xpath",
  "attempts": 3,
  "url": "https://openweathermap.org/find?q={PL}",
  "browser_config": {
    "playwright": {
      "timeout": 30,
      "wait": 30,
      "install": false,
      "browser": "Chromium"
    }
  }
}

PluginConnectorConfig

Connector can be defined via plugin system. For use that you need apply next flags to Fitter/Cli(location of the plugins):

... --plugins=./examples/plugin

--plugins - looking for all files with ".so" extension in provided folder(subdirs excluded)

type PluginConnectorConfig struct {
	Name   string          `json:"name" yaml:"name"`
	Config json.RawMessage `json:"config" yaml:"config"`
}
{
    "name": "connector",
    "config": {
      "name": "Elon"
    }
}
  • Name - name of the plugin
  • Config - json config of the plugin

How to build plugin

Build plugin

go build -buildmode=plugin -gcflags="all=-N -l" -o examples/plugin/connector.so examples/plugin/hardcoder/connector.go

Make sure you export Plugin variable which implements pl.ConnectorPlugin interface

Example for CLI:

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_plugin.json#L5

Plugin example:

package main

import (
	"encoding/json"
	"fmt"
	"github.com/PxyUp/fitter/pkg/config"
	"github.com/PxyUp/fitter/pkg/logger"
	"github.com/PxyUp/fitter/pkg/builder"
	pl "github.com/PxyUp/fitter/pkg/plugins/plugin"
)

var (
	_ pl.ConnectorPlugin = &plugin{}

	Plugin plugin
)

type plugin struct {
	log  logger.Logger
	Name string `json:"name" yaml:"name"`
}

func (pl *plugin) Get(parsedValue builder.Interfacable, index *uint32, input builder.Interfacable) ([]byte, error) {
	return []byte(fmt.Sprintf(`{"name": "%s"}`, pl.Name)), nil
}

func (pl *plugin) SetConfig(cfg *config.PluginConnectorConfig, logger logger.Logger) {
	pl.log = logger

	if cfg.Config != nil {
		err := json.Unmarshal(cfg.Config, pl)
		if err != nil {
			pl.log.Errorw("cant unmarshal plugin configuration", "error", err.Error())
			return
		}
	}
}

ReferenceConnectorConfig

Connector which allow get prefetched data from references

type ReferenceConnectorConfig struct {
	Name string `yaml:"name" json:"name"`
}

Example

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L66

IntSequenceConnectorConfig

Improved version of static connector which generate int sequence as result

type IntSequenceConnectorConfig struct {
	Start int `json:"start" yaml:"start"`
	End   int `json:"end" yaml:"end"`
	Step  int `json:"step" yaml:"step"`
}
  • Start[0] - start point for generation(included)
  • End[0] - end point for generation(excluded from final result like range in any lang)
  • Step[1] - interval for sequence

Example

{
    "start": 0,
    "end": 2 
    // Generate [0, 1]
}

Config example

FileConnectorConfig

Connector type which fetch data from provided file

type FileConnectorConfig struct {
    Path          string `yaml:"path" json:"path"`
    UseFormatting bool   `yaml:"use_formatting" json:"use_formatting"`
}

StaticConnectorConfig

Connector type which fetch data from provided string

type StaticConnectorConfig struct {
    Value string `json:"value" yaml:"value"`
    Raw   json.RawMessage `json:"raw" yaml:"raw"`
}
  • Value - static string as data, can be html, json
  • Raw - accept raw json. Example. Also support formatting

Example:

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_static_connector.json#L5

{
  "value": "[1,2,3,4,5]"
}

ServerConnectorConfig

Connector type which fetch data using golang http.Client(server side request like curl)

type ServerConnectorConfig struct {
    Method      string            `json:"method" yaml:"method"`
    Headers     map[string]string `yaml:"headers" json:"headers"`
    Timeout     uint32            `yaml:"timeout" json:"timeout"`
    JsonRawBody json.RawMessage   `json:"json_raw_body" yaml:"json_raw_body"`
    Body        string            `yaml:"body" json:"body"`
    
    Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
  • Method - supported all http methods: GET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD
  • Headers - predefine headers for using during request can be injected into key/value
  • Timeout[sec] - default 60sec timeout or used provided
  • Body - body of the request, parsed value can be injected
  • JsonRawBody - body of the request in json format; value can be injected
  • Proxy - setup proxy for request config

Example:

{
  "method": "GET",
  "proxy": {
    "server": "http://localhost:8080",
    "username": "pyx"
  }
}

Right now default timeout it is 10 sec.

Proxy config
type ProxyConfig struct {
    // Proxy to be used for all requests. HTTP and SOCKS proxies are supported, for example
    // `http://myproxy.com:3128` or `socks5://myproxy.com:3128`. Short form `myproxy.com:3128`
    // is considered an HTTP proxy.
    Server string `json:"server" yaml:"server"`
    // Optional username to use if HTTP proxy requires authentication.
    Username string `json:"username" yaml:"username"`
    // Optional password to use if HTTP proxy requires authentication.
    Password string `json:"password" yaml:"password"`
}
  • Server - address with schema of proxy server. Also support formatting
  • Username - username for proxy(can be empty). Also support formatting
  • Password - password for proxy(can be empty). Also support formatting
{
  "server": "http://localhost:8080",
  "username": "pyx"
}
Environment variables
  1. FITTER_HTTP_WORKER - int[1000] - default concurrent HTTP workers

BrowserConnectorConfig

Connector type which emulate fetching of data via browser

type BrowserConnectorConfig struct {
	Chromium   *ChromiumConfig   `json:"chromium" yaml:"chromium"`
	Docker     *DockerConfig     `json:"docker" yaml:"docker"`
	Playwright *PlaywrightConfig `json:"playwright" yaml:"playwright"`
}

Config can be one of:

  • Chromium - use local installed Chromium for fetch data
  • Docker - use docker as service for spin up container for fetch data
  • Playwright - use playwright framework for fetch data

Example:

{
    "docker": {
      "wait": 10000,
      "image": "docker.io/zenika/alpine-chrome:with-node",
      "entry_point": "chromium-browser",
      "purge": true
    }
}

Chromium

Use locally installed Chromium for fetch the data

type ChromiumConfig struct {
	Path    string   `yaml:"path" json:"path"`
	Timeout uint32   `yaml:"timeout" json:"timeout"`
	Wait    uint32   `yaml:"wait" json:"wait"`
	Flags   []string `yaml:"flags" json:"flags"`
}
  • Path - path to binary of Chromium
  • Timeout[sec] - timeout for execution of the chromium
  • Wait[msec] - timeout of page loading
  • Flags - flags for Chromium default: "--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-extensions", "--no-sandbox"

Example:

{
  "path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
  "wait": 10000
}

Docker

Use Docker for spin up container for fetch data

type DockerConfig struct {
	Image       string   `yaml:"image" json:"image"`
	EntryPoint  string   `json:"entry_point" yaml:"entry_point"`
	Timeout     uint32   `yaml:"timeout" json:"timeout"`
	Wait        uint32   `yaml:"wait" json:"wait"`
	Flags       []string `yaml:"flags" json:"flags"`
	Purge       bool     `json:"purge" yaml:"purge"`
	NoPull      bool     `yaml:"no_pull" json:"no_pull"`
	PullTimeout uint32   `yaml:"pull_timeout" json:"pull_timeout"`
}

Docker default image: docker.io/zenika/alpine-chrome

  • Image - image for the docker registry(provide with registry host)
  • EntryPoint - cmd which will be run inside container
  • Timeout[sec] - timeout for run container(without pulling image)
  • Wait[msec] - timeout of page loading (works just for Chromium based containers)
  • Flags - cmd arguments for run containers, default for Chromium based: "--no-sandbox","--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-gpu"
  • Purge - should we remove container after work done(like docker rm)
  • NoPull - prevent pulling of the image
  • PullTimeout - define timeout for pull contains
Environment variables
  1. DOCKER_HOST - string - (EnvOverrideHost) to set the URL to the docker server.
  2. DOCKER_API_VERSION - string - (EnvOverrideAPIVersion) to set the version of the API to use, leave empty for latest.
  3. DOCKER_CERT_PATH - string - (EnvOverrideCertPath) to specify the directory from which to load the TLS certificates (ca.pem, cert.pem, key.pem).
  4. DOCKER_TLS_VERIFY - bool - (EnvTLSVerify) to enable or disable TLS verification (off by default)

Example:

{
  "wait": 10000,
  "image": "docker.io/zenika/alpine-chrome:with-node",
  "entry_point": "chromium-browser",
  "purge": true
}

Playwright

Run browsers via playwright framework

type PlaywrightConfig struct {
    Browser      PlaywrightBrowser          `json:"browser" yaml:"browser"`
    Install      bool                       `yaml:"install" json:"install"`
    Timeout      uint32                     `yaml:"timeout" json:"timeout"`
    Wait         uint32                     `yaml:"wait" json:"wait"`
    TypeOfWait   *playwright.WaitUntilState `json:"type_of_wait" yaml:"type_of_wait"`
    PreRunScript string                     `json:"pre_run_script" yaml:"pre_run_script"`
    Stealth      bool                       `json:"stealth" yaml:"stealth"`
    
    Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
  • Browser - enum["Chromium", "FireFox", "WebKit"] - which browser to use
  • Install - should we install browser
  • Timeout[sec] - timeout to run playwright
  • Wait[sec] - timeout of page loading
  • TypeOfWait - enum["load", "domcontentloaded", "networkidle", "commit"] which state of page we waiting, default is "load"
  • PreRunScript[""] - script which will be executed before reading content of the page. Also support placeholder {PL}
  • Stealth[false] - add script for trying passing bot defends
  • Proxy - setup proxy for request config

Example

{
  "timeout": 30,
  "wait": 30,
  "install": false,
  "browser": "Chromium"
}

Model

With model we define result of the scrapping

type Model struct {
    ObjectConfig *ObjectConfig `yaml:"object_config" json:"object_config"`
    ArrayConfig  *ArrayConfig  `json:"array_config" yaml:"array_config"`
    BaseField    *BaseField    `json:"base_field" yaml:"base_field"`
    IsArray      bool          `json:"is_array" yaml:"is_array"`
}

Config can be one of:

Example:

{
  "object_config": {}
}

ObjectConfig

Configuration of the object and fields

type ObjectConfig struct {
    Fields      map[string]*Field `json:"fields" yaml:"fields"`
    Field       *BaseField        `json:"field" yaml:"field"`
    ArrayConfig *ArrayConfig      `json:"array_config" yaml:"array_config"`
}

Config can be one of:

  • Fields - map of each field definition; key - field name, value - configuration
  • Field - used for element of array; fields which will be deserialized like basic type like "string", "int" and etc (used here for case array of basic types)
  • ArrayConfig - used for element of array; deserialization array of array

Example:

{
  "fields": {
    "title": {
      "base_field": {
        "type": "string",
        "path": "type"
      }
    }
  }
}

ArrayConfig

Configuration of the array and fields

type ArrayConfig struct {
    RootPath    string        `json:"root_path" yaml:"root_path"`
    Reverse     bool          `yaml:"reverse" json:"reverse"`
    
    ItemConfig  *ObjectConfig `json:"item_config" yaml:"item_config"`
    LengthLimit uint32        `json:"length_limit" yaml:"length_limit"`
    
    StaticConfig *StaticArrayConfig `json:"static_array"  yaml:"static_array"`
}
  • RootPath - selector for find root element of the array or repeated element in case of html parsing, size of array will be amount of children element under the root
  • Reverse - bool[false] - indicate that need use reverse iteration(n to 1)
  • LengthLimit - for define size of array only for generated(not working for static)

Config can be one of:

Example:

{
  "root_path": "#content dt.quote > a",
  "item_config": {
    "field": {
      "type": "string"
    }
  }
}

Field

Common of the field

type Field struct {
	BaseField    *BaseField    `json:"base_field" yaml:"base_field"`
	ObjectConfig *ObjectConfig `json:"object_config" yaml:"object_config"`
	ArrayConfig  *ArrayConfig  `json:"array_config" yaml:"array_config"`

	FirstOf []*Field `json:"first_of" yaml:"first_of"`
}

Config can be one of:

  • BaseField - fields which will be deserialized like basic type like "string", "int" and etc
  • ObjectConfig - in case our field in nested object
  • ArrayConfig - in case our field in array
  • FirstOf - first not empty resolved field will be selected

Example:

{
  "base_field": {
    "type": "string",
    "path": "div.current-temp span.heading"
  }
}

BaseField

In case we want get some static information or generate new one

type BaseField struct {
	Type FieldType `yaml:"type" json:"type"`
	Path string    `yaml:"path" json:"path"`

	HTMLAttribute string `json:"html_attribute" yaml:"html_attribute"`

	Generated *GeneratedFieldConfig `yaml:"generated" json:"generated"`

	FirstOf []*BaseField `json:"first_of" yaml:"first_of"`
}
  • FieldType - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object", "html", "raw_string"] - static field for parse. Important: type html will only works from connector which return HTML (HTMLAttribute - have no effect in this case). Example
  • Path - selector(relative in case it is array child) for parsing
  • HTMLAttribute - extra value which have effect only in HTML parsing via goquery. Here you can specify which attribute need to be parsed.

Important: by default "string" type trimmed and all special chars is replaced, if you need plain string use "raw_string"

Config can be one of or empty:

  • Generated - field can be generated one which custom configuration
  • FirstOf - first not empty resolved field will be selected

Examples

{
  "generated": {
    "uuid": {}
  }
}
{
  "type": "string",
  "path": "text()"
}

GeneratedFieldConfig

Provide functionality of generating field on the flight

type GeneratedFieldConfig struct {
    UUID             *UUIDGeneratedFieldConfig   `yaml:"uuid" json:"uuid"`
    Static           *StaticGeneratedFieldConfig `yaml:"static" json:"static"`
    Formatted        *FormattedFieldConfig       `json:"formatted" yaml:"formatted"`
    Plugin           *PluginFieldConfig          `yaml:"plugin" json:"plugin"`
    Calculated       *CalculatedConfig           `yaml:"calculated" json:"calculated"`
    File             *FileFieldConfig            `yaml:"file" json:"file"`
    Model            *ModelField                 `yaml:"model" json:"model"`
    FileStorageField *FileStorageField           `json:"file_storage" yaml:"file_storage"`
}

Config can be one of:

  • UUID - generate random UUID V4
  • Static - generate static field
  • Formatted - format field
  • Model - model generated from the other connector and model
  • Plugin - plugin field
  • Calculated - calculated field
  • File - file field (for download file from server)
  • FileStorage - file field which can be saved to local file

Examples:

{
    "uuid": {}
}

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L58

{
    "model": {
      "type": "array",
      "model": {
        "array_config": {
          "root_path": "#content dt.quote > a",
          "item_config": {
            "field": {
              "type": "string"
            }
          }
        }
      },
      "connector_config": {
        "response_type": "HTML",
        "attempts": 3,
        "browser_config": {
          "url": "http://www.quotationspage.com/random.php",
          "chromium": {
            "path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
            "wait": 10000
          }
        }
      }
    }
}

UUID

Generate random UUID V4 on the flight, can be used for generate uniq id

type UUIDGeneratedFieldConfig struct {
	Regexp string `yaml:"regexp" json:"regexp"`
}
  • Regexp - provide matcher which can be used for get part of generated uuid

Static

Generate static field

type StaticGeneratedFieldConfig struct {
    Type  FieldType       `yaml:"type" json:"type"`
    Value string          `json:"value" yaml:"value"`
    Raw   json.RawMessage `json:"raw" yaml:"raw"`
}
  • Type - enum["null", "boolean", "string", "int","int64","float","float64", "array", "object"] - type of the field
  • Value - string value of the field
  • Raw - pure json value of the field

Example

{
  "type": "int",
  "value": "65"
}
{
  "type": "array",
  "value": "[65,45]"
}
{
  "type": "array",
  "raw": [65,45]
}

Formatted Field Config

Generate formatted field which will pass value from parent base field

type FormattedFieldConfig struct {
	Template string `yaml:"template" json:"template"`
}
  • Template - template in with placeholder {PL} where parent value will be injected like string

Example: https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L98

{
  "template": "https://news.ycombinator.com/item?id={PL}"
}

File Storage Field

Field can be used for store field result as local file

type FileStorageField struct {
    Content string          `json:"content" yaml:"content"`
    Raw     json.RawMessage `yaml:"raw" yaml:"raw"`
    
    FileName string `json:"file_name" yaml:"file_name"`
    Path     string `json:"path" yaml:"path"`
    Append   bool   `json:"append" yaml:"append"`
}
{
  "content": "{{{id}}}, {{{message}}}\n",
  "append": true,
  "file_name": "{{{id}}}.csv",
  "path": "/Users/pxyup/fitter/examples/cli/test/csv"
}

File Field

Field can be used for download file from server locally

type FileFieldConfig struct {
	Config *ServerConnectorConfig `yaml:"config" json:"config"`

	Url      string `yaml:"url" json:"url"`
	FileName string `json:"file_name" yaml:"file_name"`
	Path     string `json:"path" yaml:"path"`
}

Result of the field will be local file path as string

{
  "url": "https://images.shcdn.de/resized/w680/p/dekostoff-gobelinstoff-panel-oriental-cat-46-x-46_P19-KP_2.jpg",
  "path": "/Users/pxyup/fitter/bin",
  "config": {
    "method": "GET"
  }
}

With propagated URL (inject of the parent value as a string)

{
  "url": "https://picsum.photos{PL}",
  "path": "/Users/pxyup/fitter/bin",
  "config": {
    "method": "GET"
  }
}

Config example:

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image.json

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image_multiple.json

Calculated field

Field can generate different types depends from expression

type CalculatedConfig struct {
	Type       FieldType `yaml:"type" json:"type"`
	Expression string    `yaml:"expression" json:"expression"`
}
  • Type - resulting type of expression\
  • Expression - expression for calculation (we use this lib for calculated expression)
Predefined values

FNull - alias for builder.Nullvalue

FNil - alias for nil

isNull(value T) - function for check is value is FNull

fRes - it is raw(with proper type) result from the parsing base field

fIndex - it is index in parent array(only if parent was array field)

fResJson - it is JSON string representation of the raw result

fResRaw - result in bytes format

FNewLine - new line separator

{
  "type": "bool",
  "expression": "fRes > 500"
}

Plugin field

Field can be some external plugin for fitter

More

type PluginFieldConfig struct {
	Name string `json:"name" yaml:"name"`
	Config json.RawMessage `json:"config" yaml:"config"`
}
  • Name - name of the plugin(without extension just name)
  • Config - json config of the plugin

Model Field

Field type which can be generated on the flight by news model and connector

type ModelField struct {
	// Type of parsing
	ConnectorConfig *ConnectorConfig `yaml:"connector_config" json:"connector_config"`
	// Model of the response
	Model *Model `yaml:"model" json:"model"`

	Type FieldType `yaml:"type" json:"type"`
	Path string             `yaml:"path" json:"path"`

	Expression string    `yaml:"expression" json:"expression"`
}
  • ConnectorConfig - which connector to use. Important: URL in the connector can be with inject of the parent value as a string
  • Model - configuration of the underhood model
  • Type - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object"] - type of generated field
  • Path - in case we cant extract some information from generated field we can use json selector for extract
  • Expression - string which can be used for post processing of the Model (ignoring path field)

Examples:

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L60

{
  "type": "array",
  "model": {
    "array_config": {
      "root_path": "#content dt.quote > a",
      "item_config": {
        "field": {
          "type": "string"
        }
      }
    }
  }
}

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_weather.json#L37

{
    "type": "string",
    "path": "temp.temp",
    "model": {
       "object_config": {
        "fields": {
          "temp": {
            "base_field": {
              "type": "string",
              "path": "//div[@id='forecast_list_ul']//td/b/a/@href",
              "generated": {
                "model": {
                  "type": "string",
                  "model": {
                    "object_config": {
                      "fields": {
                        "temp": {
                          "base_field": {
                            "type": "string",
                            "path": "div.current-temp span.heading"
                          }
                        }
                      }
                    }
                  },
                  "connector_config": {
                    "response_type": "HTML",
                    "attempts": 4,
                    "url": "https://openweathermap.org{PL}",
                    "browser_config": {
                      "playwright": {
                        "timeout": 30,
                        "wait": 30,
                        "install": false,
                        "browser": "FireFox",
                        "type_of_wait": "networkidle"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "connector_config": {
      "response_type": "xpath",
      "attempts": 3,
      "url": "https://openweathermap.org/find?q={PL}",
      "browser_config": {
        "playwright": {
          "timeout": 30,
          "wait": 30,
          "install": false,
          "browser": "Chromium"
        }
      }
    }
}

Static Array Config

Provide static(fixed length) array generation

type StaticArrayConfig struct {
    Items map[uint32]*Field `yaml:"items" json:"items"`
    Length uint32            `yaml:"length" json:"length"`
}
  • Items - map[uint32]*Field - key is index in array, value is field definition
  • Length - if set(1+) can be used for define custom length of array

Examples:

{
  "0": {
    "base_field": {
      "type": "string",
      "path": "div.current-temp span.heading"
    }
  }
}
{
  "length": 4,
  "0": {
    "base_field": {
      "type": "string",
      "path": "div.current-temp span.heading"
    }
  }
}
{
  "length": 4,
  "2": {
    "base_field": {
      "type": "string",
      "path": "div.current-temp span.heading"
    }
  }
}

Placeholder list

  1. {PL} - for inject value
  2. {INDEX} - for inject index in parent array
  3. {HUMAN_INDEX} - for inject index in parent array in human way
  4. {{{json_path}}} - will get information from propagated "object"/"array" field
  5. {{{RefName=SomeName}}} - get reference value by name. Example
  6. {{{RefName=SomeName json.path}}} - get reference value by name and extract value by json path. Example
  7. {{{FromEnv=ENV_KEY}}} - get value from environment variable
  8. {{{FromExp=fRes + 5 + fIndex}}} - get value from the expression. Predefined values
  9. {{{FromInput=.}}} or {{{FromInput=json.path}}} - get value from input of trigger or library
  10. {{{FromFile=./test_file.log}}} - get value from file by path. Content of file also can contain placeholders
  11. {{{FromURL=http://localhost:8081}}} - get response from url

Examples:

{{{FromExp="{{{FromEnv=TEST_VAL}}}" + "hello"}}}
Current time is: {PL} with token from TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}}
Current time is: {PL} with token from TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}}
TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}} Object={{{value}}} {PL} Env={{{FromEnv=TEST_VAL}}} {INDEX} {HUMAN_INDEX}

References

Special map which prefetched(before any processing) and can be user for connector or for placeholder

Can be used for:

  1. Cache jwt token and use them in headers
  2. Cache values
  3. Etc

Reference

type Reference struct {
    *ModelField
    
    Expire uint32 `yaml:"expire" json:"expire"`
}
  • ModelField - is embedded struct, you can use same fields
  • Expire[sec] - duration when reference is expired after fetching. Not set => forever cached. Set to 0 => every time re-fetch. Set to n > 0 => cached for n second

For Fitter

type RefMap map[string]*Reference

type Config struct {
    // Other Config Fields

    Limits     *Limits `yaml:"limits" json:"limits"`
    References RefMap  `json:"references" yaml:"references"`
}

For Fitter Cli

type RefMap map[string]*Reference

type CliItem struct {
    // Other Config Fields

    Limits     *Limits `yaml:"limits" json:"limits"`
    References RefMap  `json:"references" yaml:"references"`
}

Example

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L2

{
  "references": {
    "TokenRef": {
      "expire": 10,
      "connector_config": {
        "response_type": "json",
        "static_config": {
          "value": "\"plain token\""
        }
      },
      "model": {
        "base_field": {
          "type": "string"
        }
      }
    },
    "TokenObjectRef": {
      "connector_config": {
        "response_type": "json",
        "static_config": {
          "value": "{\"token\":\"token from object\"}"
        }
      },
      "model": {
        "object_config": {
          "fields": {
            "token": {
              "base_field": {
                "type": "string",
                "path": "token"
              }
            }
          }
        }
      }
    }
  }
}

Example

Limits

Provide limitation for prevent DDOS, big usage of memory

type Limits struct {
	HostRequestLimiter HostRequestLimiter `yaml:"host_request_limiter" json:"host_request_limiter"`
	ChromiumInstance   uint32             `yaml:"chromium_instance" json:"chromium_instance"`
	DockerContainers   uint32             `yaml:"docker_containers" json:"docker_containers"`
	PlaywrightInstance uint32             `yaml:"playwright_instance" json:"playwright_instance"`
}
  • HostRequestLimiter - map[string]int64 - limitation per host name, key is host, value is amount of parallel request(usage for server connector)
  • ChromiumInstance - amount of parallel chromium instance
  • DockerContainers - amount of parallel docker instance
  • PlaywrightInstance - amount of parallel playwright instance

https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L2

{
  "limits": {
    "host_request_limiter": {
      "hacker-news.firebaseio.com": 5
    },
    "chromium_instance": 3,
    "docker_containers": 3,
    "playwright_instance": 3
  }
}

Roadmap

  1. Add browser scenario for preparing, after parsing
  2. Add scrolling support for scenario
  3. Add pagination support for scenario
  4. Add notification methods for Fitter: Webhook/Queue