Readme fixes (#65)

Co-authored-by: KH <>
capillariesio · Mar 8, 2024 · 0aa4d50 · 0aa4d50
1 parent 475a2f6
commit 0aa4d50
Show file tree

Hide file tree

Showing 7 changed files with 26 additions and 19 deletions.
diff --git a/doc/glossary.md b/doc/glossary.md
@@ -126,16 +126,23 @@ Same as [tag_criteria](#tag_criteria), but in a separate JSON file. This is the
 
 ## Go expressions
 
-One-line Go snippets used in [script](#script) settings: field expressions, writer "having" expressions, lookup "filter" expressions. For the list of supported operations, see `Eval(exp ast.Expr)` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go). For the list of supported Go functions, see `EvalFunc(callExp *ast.CallExpr, funcName string, args []interface{})` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go)
+One-line Go snippets used in [script](#script) settings:
+- field expressions
+- writer "having" expressions
+- lookup "filter" expressions
 
-At the moment, Capillaries supports only a very limited subset of the standard Go library. Additions are welcome. Keep in mind that Capillaries expression engine:
+For the list of supported operations, see `Eval(exp ast.Expr)` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go).
+
+For the list of supported Go functions, see `EvalFunc(callExp *ast.CallExpr, funcName string, args []interface{})` implementation in [eval_ctx.go](../pkg/eval/eval_ctx.go)
+
+At the moment, Capillaries supports only a limited subset of the standard Go library. Additions are welcome. Keep in mind that Capillaries expression engine:
 - supports only primitive types (see [Capillaries data types](#supported-types))
 - does not support class member function calls
 - does not support statements or multi-line expressions
 - supports aggregate functions used in [table_lookup_table](#table_lookup_table) nodes
-For the list of supported functions, see EvalFunc [eval_ctx.go](../pkg/eval/eval_ctx.go)
 
 ## Processor queue
+
 RabbitMQ queue containing messages for a [processor](#processor). The name of the queue is given by the [handler_executable_type](binconfig.md#handler_executable_type) setting.
 
 ## DOT diagrams
@@ -259,7 +266,7 @@ Defines how file writer saves values to the target file (CSV, Parquet).
 
 ### Generic file writer column properties
 
-`name`: column name to be used in [having](#w.having)
+`name`: column name to be used in [having](scriptconfig.md#whaving)
 
 `type`: one of the [supported types](#supported-types)
 
@@ -294,7 +301,7 @@ Parquet writer types:
 
 ## Index definition
 
-Used in [w.indexes](#w.indexes). Syntax:
+Used in [w.indexes](scriptconfig.md#windexes). Syntax:
 
 ```
 [unique|non_unique](order_expression)
@@ -304,7 +311,7 @@ where order_expression is an [order expression](#order-expression).
 A unique index enforces key uniqueness on the database level. Key uniqueness does not affect lookup behaviour.
 
 ## Order expression
-Used in [index definitions](#index-definition), [top.order](#w.top) and [dependency policy event_priority_order](#event_priority_order) settings. Syntax:
+Used in [index definitions](#index-definition), [top/order](scriptconfig.md#wtop) and [dependency policy event_priority_order](#event_priority_order) settings. Syntax:
 ```
 [<field_name>([case_modifier|sort_modifier,...]),...]
 ```

diff --git a/doc/qna.md b/doc/qna.md
@@ -32,7 +32,7 @@ A. The number of nodes in the script and runs performed for a keyspace are virtu
 
 Q. I can't see any code/example that works with NULLs. Are they supported?
 
-A. There is no support for NULL values. To mitigate it, Capillaries offers support for custom default values. See `default_value` in [File reader column definition](glossary.md#file-reader-column-definition) and [Table reader column definition](glossary.md#table-reader-column-definition).
+A. There is no support for NULL values. To mitigate it, Capillaries offers support for custom default values. See `default_value` in [Table write field definition](glossary.md#table-writer-field-definition). Whever an empty value found in the source CSV or Parquet file, this default_value will be written to the [table](glossary.md#table).
 
 ## Re-processing granularity
 
@@ -71,7 +71,7 @@ A. Start a run that dumps the table into files via [file writer](glossary.md#tab
 
 Q. Is there a UI for Capillaries?
 
-A. Yes. See [Capillaries UI](../ui/README.md) project, which is a simple web single-page application that shows the status of every [run](glossary.md#run) in every [keyspace](glossary.md#keyspace). UI requirements tend to be very business-specific, it's not an easy task to come up with a cookie-cutter UI framework that would be flexible enough. Dedicated solution developers are encouraged to develop their own UI for Capillaries workflows, using [Capillaries Webapi](glossary.md#webapi) and [Capillaries UI](../ui/README.md) as an example.
+A. Yes. See [Capillaries UI](../ui/README.md) project, which is a simple web single-page application that shows the status of every [run](glossary.md#run) in every [keyspace](glossary.md#keyspace). UI requirements tend to be very business-specific, it's not an easy task to come up with a cookie-cutter UI framework that would be flexible enough. Dedicated solution developers are encouraged to develop their own UI for Capillaries workflows, using [Capillaries Webapi](glossary.md#webapi) as a back-end and [Capillaries UI](../ui/README.md) as an example.
 
 Also please note that [Toolbelt](glossary.md#toolbelt) can produce rudimentary visuals using [DOT diagram language](glossary.md#dot-diagrams) - see [Toolbelt](glossary.md#toolbelt) `validate_script`, `get_run_status_diagram` commands.
 
@@ -105,9 +105,9 @@ A. Here are some, in no particular order:
 
 1. Performance enhancements, espcecially those related to the efficient use of Cassandra.
 
-2. Read/write from/to other file formats, maybe databases. Update 2023: Apache Parquet support was added.
+2. Read/write from/to other file formats, maybe databases. Update 2023: Apache Parquet support was added, see [Parquet reader](glossary.md#parquet-reader-column-properties) and [Parquet writer](glossary.md#parquet-specific-writer-column-properties).
 
-3. Creating node configuration is a tedious job. Consider adding a toolbelt command that takes a CSV file as an input and generates JSON for a corresponding file_table/table_file node. Update 2023: done, see [proto_file_reader_creator test](../test/code/proto_file_reader_creator/README.md).
+3. Creating node configuration is a tedious job. Consider adding a toolbelt command that takes a CSV or Parquet file as an input and generates JSON for a corresponding file_table/table_file node. Update 2023: done, see [proto_file_reader_creator test](../test/code/proto_file_reader_creator/README.md).
 
 4. Is the lack of NULL vsalues support a deal-breaker? Update March 2024: support for *_if aggregate functions was added, it should help mitigate the lack of NULL support.
 

diff --git a/test/code/fannie_mae/README.md b/test/code/fannie_mae/README.md
@@ -26,8 +26,8 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :
 
 - [distinct_table](../../../doc/glossary.md#distinct_table) node type
 - [file_table](../../../doc/glossary.md#file_table) read from multiple files file
-- [table_file](../../../doc/glossary.md#table_file) with top/limit/order
-- [py_calc](../../../doc/glossary.md#py_calc-processor) calculations taking JSON as input and producing JSON
+- [table_file](../../../doc/glossary.md#table_file) with [top/limit/order](../../../doc/scriptconfig.md#wtop)
+- [table_custom_tfm_table](../../../doc/glossary.md#table_custom_tfm_table) custom processor [py_calc](../../../doc/glossary.md#py_calc-processor) calculations taking JSON as input and producing JSON
 - [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism, left outer grouped joins, string_agg() aggregate function
 - some *_if aggregate functions
 - single-run script execution

diff --git a/test/code/lookup/README.md b/test/code/lookup/README.md
@@ -27,7 +27,7 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :
 
 - [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism (10 batches), all suported types of joins (inner and left outer, grouped and not)
 - [file_table](../../../doc/glossary.md#file_table) read from single file
-- [table_file](../../../doc/glossary.md#table_file) with top/limit/order
+- [table_file](../../../doc/glossary.md#table_file) with [top/limit/order](../../../doc/scriptconfig.md#wtop)
 - single-run (test_one_run.sh) and multi-run (test_two_runs.sh) script execution
 
 Multi-run test simulates the scenario when an operator validates loaded order and order item data before proceeding with joining orders with order items.

diff --git a/test/code/portfolio/README.md b/test/code/portfolio/README.md
@@ -64,7 +64,7 @@ See results in /tmp/capi_out/portfolio_quicktest.
 - [file_table](../../../doc/glossary.md#file_table) read from file directly into JSON fields
 - [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism, left outer grouped joins, string_agg() aggregate function
 - [py_calc](../../../doc/glossary.md#py_calc-processor) calculations taking JSON as input and producing JSON
-- [table_file](../../../doc/glossary.md#table_file) with top/order to produce ordered performance data matrix
+- [table_file](../../../doc/glossary.md#table_file) with [top/order](../../../doc/scriptconfig.md#wtop) to produce ordered performance data matrix
 
 ## How to test
 

diff --git a/test/code/py_calc/README.md b/test/code/py_calc/README.md
@@ -18,11 +18,11 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :
 
 ## What's tested:
 
-- table_custom_tfm_table custom processor (py_calc) with writer using values from both reader (for example, r.shipping_limit_date) and custom processor (for example, p.taxed_value); please note: p.* datatype, like decimal2 of p.taxed_value, is used by writer only, do not expect this datatype when using this field in your Python code
-[file_table](../../../doc/glossary.md#file_table)file_table reading from multiple files
-- [table_file](../../../doc/glossary.md#table_file) with top/limit/order
+- [table_custom_tfm_table](../../../doc/glossary.md#table_custom_tfm_table) custom processor [py_calc](../../../doc/glossary.md#py_calc-processor) with writer using values from both reader (for example, r.shipping_limit_date) and custom processor (for example, p.taxed_value); please note: p.* datatype, like decimal2 of p.taxed_value, is used by writer only, do not expect this datatype when using this field in your Python code
+- [file_table](../../../doc/glossary.md#file_table)file_table reading from multiple files
+- [table_file](../../../doc/glossary.md#table_file) with [top/limit/order](../../../doc/scriptconfig.md#wtop)
 - [table_file](../../../doc/glossary.md#table_file) using file-per-batch configuration (see {batch_idx} parameter)
-- table_table processor that, using Capillaries Go funtions and arithmetic operations, implements a subset (no weekday math) of calculations provided by Python processor 
+- [table_table](../../../doc/glossary.md#table_table) processor that, using Capillaries Go funtions and arithmetic operations, implements a subset (no weekday math) of calculations provided by py_calc processor above 
 
 ## How to test
 

diff --git a/test/code/tag_and_denormalize/README.md b/test/code/tag_and_denormalize/README.md
@@ -17,7 +17,7 @@ and rendered in https://dreampuf.github.io/GraphvizOnline :
 - [file_table](../../../doc/glossary.md#file_table) read from single file
 - [tag_and_denormalize](../../../doc/glossary.md#tag_and_denormalize-processor) custom processor: denormalizes products table by checking tag criteria and producing a new data row for each matching tag
 - [table_lookup_table](../../../doc/glossary.md#table_lookup_table) with parallelism (10 batches), left outer join with grouping
-- [table_file](../../../doc/glossary.md#table_file)table_file with top/limit/order
+- [table_file](../../../doc/glossary.md#table_file)table_file with [top/limit/order](../../../doc/scriptconfig.md#wtop)
 - single-run (test_one_run.sh) and multi-run (test_two_runs.sh) script execution
 
 Multi-run test simulates the scenario when an operator validates tagged products (see /data/out/tagged_products_for_operator_review.csv) before proceeding with calculating totals.