Filter Expression support #440

martinsumner · 2024-05-17T18:41:14Z

Allow indexes to be evaluated into a Key/Value map using an eval pipeline, and then filtered using a filter expression

This is v6 of the perf_SUITE tests. The test adds a complex index entry to every object, and then adds a new test phase to test regex queries. There are three profiles added so the full, mini and profiling versions of perf_SUITE can be run without having to edit the file itself: e.g. ./rebar3 as perf_mini do ct --suite=test/end_to_end/perf_SUITE When testing as `perf_prof` summarised versions of the eprof results are now printed to screen. The volume of keys within the full test suite has been dropped ... just to make life easier so that test run times are not excessively increase by the new features.

The re2 library, though not offering the complete functionality of pcre, is known to be much faster in many scenarios. Switch to re2 in preparation for the OTP switch to re2 in OTP28

Assume to make the job of gs easier - name makes a massive difference to load time in OTP 24.

alos try and improve test stability by increasing pause

Identify and log out memory usage by test phase

Allow for a capturing regular expression to be passed with a filter function that will filter based on the captured output. Also allows a specific captured term to be returned as the returned term (rather than the whole index term)

First draft of allowing an externally provided comparison expression to be validated, parsed and evaluated

Required as re2 can only output binary not string in captures.

Allow for capture with either delimited index terms, or with regex capture, and then filtering of captured attributes using a logical expression. The xref check `locals_not_used` is now ignored due to issues with auto-generated module.

Allow for the generation of projected attributes from an index term

Allow for additional functions in eval. Change the leveled query API to generalise application of eval/filter expressions.

Remove `capture` option - as all can be achieved via eval/filter.

The Filter should expect to compare BETWEEN low AND high ... supporting high AND low may be confusing. The filter should allow for identifier to be in any location in a comparator i.e. string BETWEEN identifier AND identifier integer > identifier This make check some validity constraints harder at parse time (i.e. that High >= Low), but makes tse result more compatible. For ease, of implementation IN can only be used for a comparison between a string and a list of strings (with the identifier representing either part).

now passes 10k tests

Differentiate between positive/negative integers and non-negative integers. Stop parsing of unusable split/index values.

Works on index terms of type unicode_binary()

rebar.config

src/leveled_codec.erl

src/leveled_eval.erl

ThomasArts · 2024-05-23T14:57:59Z

src/leveled_eval.erl

+        " | map($birthday, <=, ((\"0430\", 2024)), 2023, $yoc)"
+        " | subtract($yoc, $yob, $age)"
+        " | add($age, 1, $age_next)"
+        " | to_string($age, $age)"


Didn't realize one can overload variables to have different types... probably not the nicest thing to use in an example.

ThomasArts · 2024-05-23T15:02:55Z

src/leveled_evallexer.xrl

+%% Lexer for eval expressions
+
+Definitions.
+WhiteSpace  = ([\t\f\v\r\n\s]+)


When reading the documentation, I would say \s subsumes the others.

src/leveled_evallexer.xrl

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

…n' into mas-d31-mas.i1433-filterexpression

This reverts commit 1b320c6.

Update tot he lexers, to require strings to be non-empty, which rules out a raft of potential failure scenarios during evaluation. Handle unexpected types in eval functions by not creating the attribute rather than throwing an exception. Add a new ends_with function, that behaves as begins_with in reverse.

Allow for return_keys of logical combinations of queries

Avoid confusion over meaning of NOT

Don't rely on ordering of inputs, use a numeric identifier as a key

src/leveled_codec.erl

SubKey of integer() used - falls foul of stronger checking

Using the binary match/split to break up binary strings is a faster than using string split, as pre-compiled patterns can be used (saving 0.16 microsec per application). compiled patterns can be passed across nodes, so pre-compiling them in the parser makes the function non-specific. So instead, compile on demand. If, as expected, this process is to check many times - at least have it compile only the once. Examining the eprof profile also indicated significant time in lists:sublist - so the process of updating the map has been simplified. Empty attributes will not now be added - and in the case of regex, attributes are only added if all are produced as expected.

martinsumner and others added 21 commits April 12, 2024 16:11

Switch to re2 library

4950700

The re2 library, though not offering the complete functionality of pcre, is known to be much faster in many scenarios. Switch to re2 in preparation for the OTP switch to re2 in OTP28

Load chunk in spawned processes

be6cdc7

Assume to make the job of gs easier - name makes a massive difference to load time in OTP 24.

Correctly account for pause

60a4f78

alos try and improve test stability by increasing pause

Merge branch 'mas-d31-i433perfSUITE' into mas-d31-i433re2

ebc79f4

Support both re2 and pcre

3b98077

Add microstate accounting to profile

c4f428f

Add memory tracking during test phases

56b2e1f

Identify and log out memory usage by test phase

Merge branch 'mas-d31-i433perfSUITE' into mas-d31-i433re2

31f2aa4

Use macros instead (#437)

611b8ac

Don't print memory to screen in standard ct test

ce5db70

Merge branch 'mas-d31-i433perfSUITE' into mas-d31-i433re2

8fe29eb

Initial support for capture in regex

287e8be

Allow for a capturing regular expression to be passed with a filter function that will filter based on the captured output. Also allows a specific captured term to be returned as the returned term (rather than the whole index term)

Create leveled_filter.erl

f54d75c

First draft of allowing an externally provided comparison expression to be validated, parsed and evaluated

Make binary comparisons in leveled_filter

4a4a7ca

Required as re2 can only output binary not string in captures.

Add Filter Expression support

2b61a13

Allow for capture with either delimited index terms, or with regex capture, and then filtering of captured attributes using a logical expression. The xref check `locals_not_used` is now ignored due to issues with auto-generated module.

Initial lexer/parser for eval pipeline

bb7c889

Allow for the generation of projected attributes from an index term

Update and extend the eval expression

80ce3c7

Allow for additional functions in eval. Change the leveled query API to generalise application of eval/filter expressions.

Extend testing

1563a7c

Add regex eval function to pipeline

59c556a

Remove `capture` option - as all can be achieved via eval/filter.

Remove re2

2680bf2

martinsumner mentioned this pull request May 17, 2024

Mas d31 i433re2capture #438

Closed

martinsumner and others added 8 commits May 20, 2024 18:56

First version of QuickCheck properties for setop lang and filter lang

1e22841

Update filterlang eqc property

852e06a

Updated quickcheck for filter language

6a24913

now passes 10k tests

improvements to implementation

bb8f33d

Also improve eval lexer

a1ae05c

Be more specific re integer types

90c8d7b

Differentiate between positive/negative integers and non-negative integers. Stop parsing of unusable split/index values.

Basic unicode testing

fcd195f

Works on index terms of type unicode_binary()

runtime type errors in eval

1b320c6

ThomasArts reviewed May 23, 2024

View reviewed changes

rebar.config Outdated Show resolved Hide resolved

ThomasArts reviewed May 23, 2024

View reviewed changes

src/leveled_codec.erl Show resolved Hide resolved

ThomasArts reviewed May 23, 2024

View reviewed changes

src/leveled_eval.erl Outdated Show resolved Hide resolved

ThomasArts reviewed May 23, 2024

View reviewed changes

src/leveled_evallexer.xrl Outdated Show resolved Hide resolved

martinsumner and others added 10 commits May 23, 2024 16:35

Update src/leveled_evallexer.xrl

9353f11

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

Update src/leveled_eval.erl

afee352

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

Merge remote-tracking branch 'quviq/mas-d31-mas.i1433-filterexpressio…

787c424

…n' into mas-d31-mas.i1433-filterexpression

Revert "runtime type errors in eval"

9984943

This reverts commit 1b320c6.

Add support for combination queries on same snapshot point

bf69dc5

Allow for return_keys of logical combinations of queries

Setop parser to use only set operation names

9c7ff44

Avoid confusion over meaning of NOT

Proposed changes to handle indexed sets

044537a

quickcheck properties for setop language

3afa892

Update setop to use maps as input

c7dd6e7

Don't rely on ordering of inputs, use a numeric identifier as a key

ThomasArts approved these changes May 28, 2024

View reviewed changes

src/leveled_codec.erl Show resolved Hide resolved

martinsumner added 2 commits September 27, 2024 13:14

Merge branch 'develop-3.4' into mas-d31-mas.i1433-filterexpression

1f97828

Remove duplications following merge

8075de5

martinsumner changed the base branch from develop-3.1 to develop-3.4 September 27, 2024 12:58

ThomasArts approved these changes Nov 12, 2024

View reviewed changes

martinsumner added 8 commits November 15, 2024 15:41

Merge branch 'develop-3.4' into mas-d31-mas.i1433-filterexpression

8adb50a

Additional changes to binary keys in test

0f7ed91

SubKeys must be binary()/null

bf4c345

SubKey of integer() used - falls foul of stronger checking

Merge branch 'develop-3.4' into mas-d31-mas.i1433-filterexpression

c340ac3

Merge branch 'develop-3.4' into mas-d31-mas.i1433-filterexpression

7ba4957

Add error output when generating function from unparsable input

648d42c

Fix types

8625536

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter Expression support #440

Filter Expression support #440

martinsumner commented May 17, 2024

ThomasArts May 23, 2024

ThomasArts May 23, 2024

Filter Expression support #440

Are you sure you want to change the base?

Filter Expression support #440

Conversation

martinsumner commented May 17, 2024

ThomasArts May 23, 2024

Choose a reason for hiding this comment

ThomasArts May 23, 2024

Choose a reason for hiding this comment