Skip to content

Commit

Permalink
Readme fixes (#65)
Browse files Browse the repository at this point in the history
Co-authored-by: KH <>
  • Loading branch information
kleineshertz authored Mar 8, 2024
1 parent 475a2f6 commit 0aa4d50
Show file tree
Hide file tree
Showing 7 changed files with 26 additions and 19 deletions.
19 changes: 13 additions & 6 deletions doc/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,16 +126,23 @@ Same as [tag_criteria](#tag_criteria), but in a separate JSON file. This is the

## Go expressions

One-line Go snippets used in [script](#script) settings: field expressions, writer "having" expressions, lookup "filter" expressions. For the list of supported operations, see `Eval(exp ast.Expr)` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go). For the list of supported Go functions, see `EvalFunc(callExp *ast.CallExpr, funcName string, args []interface{})` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go)
One-line Go snippets used in [script](#script) settings:
- field expressions
- writer "having" expressions
- lookup "filter" expressions

At the moment, Capillaries supports only a very limited subset of the standard Go library. Additions are welcome. Keep in mind that Capillaries expression engine:
For the list of supported operations, see `Eval(exp ast.Expr)` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go).

For the list of supported Go functions, see `EvalFunc(callExp *ast.CallExpr, funcName string, args []interface{})` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go)

At the moment, Capillaries supports only a limited subset of the standard Go library. Additions are welcome. Keep in mind that Capillaries expression engine:
- supports only primitive types (see [Capillaries data types](#supported-types))
- does not support class member function calls
- does not support statements or multi-line expressions
- supports aggregate functions used in [table_lookup_table](#table_lookup_table) nodes
For the list of supported functions, see EvalFunc [eval_ctx.go](../pkg/eval/eval_ctx.go)

## Processor queue

RabbitMQ queue containing messages for a [processor](#processor). The name of the queue is given by the [handler_executable_type](binconfig.md#handler_executable_type) setting.

## DOT diagrams
Expand Down Expand Up @@ -259,7 +266,7 @@ Defines how file writer saves values to the target file (CSV, Parquet).

### Generic file writer column properties

`name`: column name to be used in [having](#w.having)
`name`: column name to be used in [having](scriptconfig.md#whaving)

`type`: one of the [supported types](#supported-types)

Expand Down Expand Up @@ -294,7 +301,7 @@ Parquet writer types:

## Index definition

Used in [w.indexes](#w.indexes). Syntax:
Used in [w.indexes](scriptconfig.md#windexes). Syntax:

```
[unique|non_unique](order_expression)
Expand All @@ -304,7 +311,7 @@ where order_expression is an [order expression](#order-expression).
A unique index enforces key uniqueness on the database level. Key uniqueness does not affect lookup behaviour.

## Order expression
Used in [index definitions](#index-definition), [top.order](#w.top) and [dependency policy event_priority_order](#event_priority_order) settings. Syntax:
Used in [index definitions](#index-definition), [top/order](scriptconfig.md#wtop) and [dependency policy event_priority_order](#event_priority_order) settings. Syntax:
```
[<field_name>([case_modifier|sort_modifier,...]),...]
```
Expand Down
8 changes: 4 additions & 4 deletions doc/qna.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ A. The number of nodes in the script and runs performed for a keyspace are virtu

Q. I can't see any code/example that works with NULLs. Are they supported?

A. There is no support for NULL values. To mitigate it, Capillaries offers support for custom default values. See `default_value` in [File reader column definition](glossary.md#file-reader-column-definition) and [Table reader column definition](glossary.md#table-reader-column-definition).
A. There is no support for NULL values. To mitigate it, Capillaries offers support for custom default values. See `default_value` in [Table write field definition](glossary.md#table-writer-field-definition). Whever an empty value found in the source CSV or Parquet file, this default_value will be written to the [table](glossary.md#table).

## Re-processing granularity

Expand Down Expand Up @@ -71,7 +71,7 @@ A. Start a run that dumps the table into files via [file writer](glossary.md#tab

Q. Is there a UI for Capillaries?

A. Yes. See [Capillaries UI](../ui/README.md) project, which is a simple web single-page application that shows the status of every [run](glossary.md#run) in every [keyspace](glossary.md#keyspace). UI requirements tend to be very business-specific, it's not an easy task to come up with a cookie-cutter UI framework that would be flexible enough. Dedicated solution developers are encouraged to develop their own UI for Capillaries workflows, using [Capillaries Webapi](glossary.md#webapi) and [Capillaries UI](../ui/README.md) as an example.
A. Yes. See [Capillaries UI](../ui/README.md) project, which is a simple web single-page application that shows the status of every [run](glossary.md#run) in every [keyspace](glossary.md#keyspace). UI requirements tend to be very business-specific, it's not an easy task to come up with a cookie-cutter UI framework that would be flexible enough. Dedicated solution developers are encouraged to develop their own UI for Capillaries workflows, using [Capillaries Webapi](glossary.md#webapi) as a back-end and [Capillaries UI](../ui/README.md) as an example.

Also please note that [Toolbelt](glossary.md#toolbelt) can produce rudimentary visuals using [DOT diagram language](glossary.md#dot-diagrams) - see [Toolbelt](glossary.md#toolbelt) `validate_script`, `get_run_status_diagram` commands.

Expand Down Expand Up @@ -105,9 +105,9 @@ A. Here are some, in no particular order:

1. Performance enhancements, espcecially those related to the efficient use of Cassandra.

2. Read/write from/to other file formats, maybe databases. Update 2023: Apache Parquet support was added.
2. Read/write from/to other file formats, maybe databases. Update 2023: Apache Parquet support was added, see [Parquet reader](glossary.md#parquet-reader-column-properties) and [Parquet writer](glossary.md#parquet-specific-writer-column-properties).

3. Creating node configuration is a tedious job. Consider adding a toolbelt command that takes a CSV file as an input and generates JSON for a corresponding file_table/table_file node. Update 2023: done, see [proto_file_reader_creator test](../test/code/proto_file_reader_creator/README.md).
3. Creating node configuration is a tedious job. Consider adding a toolbelt command that takes a CSV or Parquet file as an input and generates JSON for a corresponding file_table/table_file node. Update 2023: done, see [proto_file_reader_creator test](../test/code/proto_file_reader_creator/README.md).

4. Is the lack of NULL vsalues support a deal-breaker? Update March 2024: support for *_if aggregate functions was added, it should help mitigate the lack of NULL support.

Expand Down
4 changes: 2 additions & 2 deletions test/code/fannie_mae/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :

- [distinct_table](../../../doc/glossary.md#distinct_table) node type
- [file_table](../../../doc/glossary.md#file_table) read from multiple files file
- [table_file](../../../doc/glossary.md#table_file) with top/limit/order
- [py_calc](../../../doc/glossary.md#py_calc-processor) calculations taking JSON as input and producing JSON
- [table_file](../../../doc/glossary.md#table_file) with [top/limit/order](../../../doc/scriptconfig.md#wtop)
- [table_custom_tfm_table](../../../doc/glossary.md#table_custom_tfm_table) custom processor [py_calc](../../../doc/glossary.md#py_calc-processor) calculations taking JSON as input and producing JSON
- [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism, left outer grouped joins, string_agg() aggregate function
- some *_if aggregate functions
- single-run script execution
Expand Down
2 changes: 1 addition & 1 deletion test/code/lookup/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :

- [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism (10 batches), all suported types of joins (inner and left outer, grouped and not)
- [file_table](../../../doc/glossary.md#file_table) read from single file
- [table_file](../../../doc/glossary.md#table_file) with top/limit/order
- [table_file](../../../doc/glossary.md#table_file) with [top/limit/order](../../../doc/scriptconfig.md#wtop)
- single-run (test_one_run.sh) and multi-run (test_two_runs.sh) script execution

Multi-run test simulates the scenario when an operator validates loaded order and order item data before proceeding with joining orders with order items.
Expand Down
2 changes: 1 addition & 1 deletion test/code/portfolio/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ See results in /tmp/capi_out/portfolio_quicktest.
- [file_table](../../../doc/glossary.md#file_table) read from file directly into JSON fields
- [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism, left outer grouped joins, string_agg() aggregate function
- [py_calc](../../../doc/glossary.md#py_calc-processor) calculations taking JSON as input and producing JSON
- [table_file](../../../doc/glossary.md#table_file) with top/order to produce ordered performance data matrix
- [table_file](../../../doc/glossary.md#table_file) with [top/order](../../../doc/scriptconfig.md#wtop) to produce ordered performance data matrix

## How to test

Expand Down
8 changes: 4 additions & 4 deletions test/code/py_calc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :

## What's tested:

- table_custom_tfm_table custom processor (py_calc) with writer using values from both reader (for example, r.shipping_limit_date) and custom processor (for example, p.taxed_value); please note: p.* datatype, like decimal2 of p.taxed_value, is used by writer only, do not expect this datatype when using this field in your Python code
[file_table](../../../doc/glossary.md#file_table)file_table reading from multiple files
- [table_file](../../../doc/glossary.md#table_file) with top/limit/order
- [table_custom_tfm_table](../../../doc/glossary.md#table_custom_tfm_table) custom processor [py_calc](../../../doc/glossary.md#py_calc-processor) with writer using values from both reader (for example, r.shipping_limit_date) and custom processor (for example, p.taxed_value); please note: p.* datatype, like decimal2 of p.taxed_value, is used by writer only, do not expect this datatype when using this field in your Python code
- [file_table](../../../doc/glossary.md#file_table)file_table reading from multiple files
- [table_file](../../../doc/glossary.md#table_file) with [top/limit/order](../../../doc/scriptconfig.md#wtop)
- [table_file](../../../doc/glossary.md#table_file) using file-per-batch configuration (see {batch_idx} parameter)
- table_table processor that, using Capillaries Go funtions and arithmetic operations, implements a subset (no weekday math) of calculations provided by Python processor
- [table_table](../../../doc/glossary.md#table_table) processor that, using Capillaries Go funtions and arithmetic operations, implements a subset (no weekday math) of calculations provided by py_calc processor above

## How to test

Expand Down
2 changes: 1 addition & 1 deletion test/code/tag_and_denormalize/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :
- [file_table](../../../doc/glossary.md#file_table) read from single file
- [tag_and_denormalize](../../../doc/glossary.md#tag_and_denormalize-processor) custom processor: denormalizes products table by checking tag criteria and producing a new data row for each matching tag
- [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism (10 batches), left outer join with grouping
- [table_file](../../../doc/glossary.md#table_file)table_file with top/limit/order
- [table_file](../../../doc/glossary.md#table_file)table_file with [top/limit/order](../../../doc/scriptconfig.md#wtop)
- single-run (test_one_run.sh) and multi-run (test_two_runs.sh) script execution

Multi-run test simulates the scenario when an operator validates tagged products (see /data/out/tagged_products_for_operator_review.csv) before proceeding with calculating totals.
Expand Down

0 comments on commit 0aa4d50

Please sign in to comment.