Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Expression support #440

Open
wants to merge 48 commits into
base: develop-3.4
Choose a base branch
from

Conversation

martinsumner
Copy link
Owner

Allow indexes to be evaluated into a Key/Value map using an eval pipeline, and then filtered using a filter expression

martinsumner and others added 21 commits April 12, 2024 16:11
This is v6 of the perf_SUITE tests.  The test adds a complex index entry to every object, and then adds a new test phase to test regex queries.

There are three profiles added so the full, mini and profiling versions of perf_SUITE can be run without having to edit the file itself:

e.g. ./rebar3 as perf_mini do ct --suite=test/end_to_end/perf_SUITE

When testing as `perf_prof` summarised versions of the eprof results are now printed to screen.

The volume of keys within the full test suite has been dropped ... just to make life easier so that test run times are not excessively increase by the new features.
The re2 library, though not offering the complete functionality of pcre, is known to be much faster in many scenarios.

Switch to re2 in preparation for the OTP switch to re2 in OTP28
Assume to make the job of gs easier - name makes a massive difference to load time in OTP 24.
alos try and improve test stability by increasing pause
Identify and log out memory usage by test phase
Allow for a capturing regular expression to be passed with a filter function that will filter based on the captured output.

Also allows a specific captured term to be returned as the returned term (rather than the whole index term)
First draft of allowing an externally provided comparison expression to be validated, parsed and evaluated
Required as re2 can only output binary not string in captures.
Allow for capture with either delimited index terms, or with regex capture, and then filtering of captured attributes using a logical expression.

The xref check `locals_not_used` is now ignored due to issues with auto-generated module.
Allow for the generation of projected attributes from an index term
Allow for additional functions in eval.  Change the leveled query API to generalise application of eval/filter expressions.
Remove `capture` option - as all can be achieved via eval/filter.
martinsumner and others added 8 commits May 20, 2024 18:56
The Filter should expect to compare BETWEEN low AND high ... supporting high AND low may be confusing.

The filter should allow for identifier to be in any location in a comparator i.e.

string BETWEEN identifier AND identifier

integer > identifier

This make check some validity constraints harder at parse time (i.e. that High >= Low), but makes tse result more compatible.

For ease, of implementation IN can only be used for a  comparison between a string and a list of strings (with the identifier representing either part).
Differentiate between positive/negative integers and non-negative integers.  Stop parsing of unusable split/index values.
Works on index terms of type unicode_binary()
rebar.config Outdated Show resolved Hide resolved
src/leveled_eval.erl Outdated Show resolved Hide resolved
" | map($birthday, <=, ((\"0430\", 2024)), 2023, $yoc)"
" | subtract($yoc, $yob, $age)"
" | add($age, 1, $age_next)"
" | to_string($age, $age)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize one can overload variables to have different types... probably not the nicest thing to use in an example.

%% Lexer for eval expressions

Definitions.
WhiteSpace = ([\t\f\v\r\n\s]+)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the documentation, I would say \s subsumes the others.

martinsumner and others added 10 commits May 23, 2024 16:35
Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
Update tot he lexers, to require strings to be non-empty, which rules out a raft of potential failure scenarios during evaluation.

Handle unexpected types in eval functions by not creating  the attribute rather than throwing an exception.

Add a new ends_with function, that behaves as begins_with in reverse.
Allow for return_keys of logical combinations of queries
Avoid confusion over meaning of NOT
Don't rely on ordering of inputs, use a numeric identifier as a key
src/leveled_codec.erl Show resolved Hide resolved
@martinsumner martinsumner changed the base branch from develop-3.1 to develop-3.4 September 27, 2024 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants