Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String operators and builtin functions #38

Merged
merged 6 commits into from
Apr 13, 2019

Conversation

bittrance
Copy link
Contributor

I'm planning a project which will have both rules (i.e. boolean expressions) and transformations (i.e. string expressions) in the same configuration entry, so it makes sense to use a library with support for both to get a unified syntax for these expressions. Evalexpr almost fills this description, but would need a bit more string support. Given that these functions are pretty standard, I thought it makes sense to contribute them upstream rather than keep them locally.

This PR introduces common operations for strings:

Identifier Argument Amount Description
downcase 1 Returns lower-case version of string
len 1 Return the character length of string argument
match 2 Returns true if first string argument matches regex in second
replace 3 Returns string with matches replaced by third argument
trim 1 Strips whitespace from start and end of string
upcase 1 Returns upper-case version of string

This PR adds a dependence on the regex crate. This should probably be optional, but I'm thinking it is so likely that a string user will want to use regexes that it might make more sense to try to make all string functions optional?

@bittrance
Copy link
Contributor Author

Having thought a bit more, the question seems to boil down to one of type separarion. One strategy would be to support implicit type conversion (e.g. "foo" + 1). The other strategy would be to be very strict, perhaps by implementing a separate string concatenation operator (e.g. "foo" . "bar") and provide explicit type conversion functions (e.g. format("%s", 4.3)). Which would you prefer?

@ISibboI
Copy link
Owner

ISibboI commented Apr 8, 2019

Hey bittrance,

thank you for your interest in this crate and for your contribution. I have a few remarks:

  1. I want to introduce namespaces into this crate. This way, variables and functions can have more of a context. So I propose to put most of the functions you implemented into the str namespace. For now, this would just mean to add the prefix str:: to their names.

  2. I would like to keep the names of builtin functions as close to the Rust names as possible. I propose the following names:

    • str::to_uppercase
    • len - This can actually be reused for different types, so it makes sense to keep it in the root namespace
    • str::matches
    • str::replace
    • str::trim
    • str::to_lowercase
  3. I would make the dependency to the regex crate optional, because one of the selling-arguments of this crate is having no required dependencies. We should introduce the feature regex_support for the str::matches function. The other string functions that don't require regex should be non-optional in my opinion, because they don't depend on regex.

  4. The crate already does implicit type conversion from int to float in all relevant operations, if not all arguments are int. So I think overloading the + operator to concatenate strings and implicitly convert to string if only one argument is a string is fine and fits well into the crate.

That's it so far. I will also take a look at the code.
Thank you very much for this pull request!

Copy link
Owner

@ISibboI ISibboI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code and I think it looks good. There are some minor things here and there.

And I see you did not implement the auto-conversion yet. I would accept the pull-request without, that's fine. But then we should definetly make an issue for the auto-conversion

src/error/display.rs Outdated Show resolved Hide resolved
src/error/mod.rs Outdated Show resolved Hide resolved
src/function/builtin.rs Outdated Show resolved Hide resolved
src/function/builtin.rs Outdated Show resolved Hide resolved
src/error/mod.rs Outdated Show resolved Hide resolved
@ISibboI
Copy link
Owner

ISibboI commented Apr 8, 2019

(I just started a new job and moved to a new location, so my work on this is slowed down a bit. But as soon as I get internet at home, I will be able to care for this crate properly again.)

@ISibboI ISibboI added enhancement New feature or request feature:regex Related to the optional regex feature of this crate labels Apr 8, 2019
@ISibboI ISibboI added this to the 4.2.0 milestone Apr 8, 2019
@bittrance
Copy link
Contributor Author

I understand your plan to have namespaces and it would make sense for some use cases, particularly where the developer doesn't have control over the runtime list of available functions, but I'm not sure it will make much sense to my users. If I use evalexpr as a light-weight configuration templating language, I know that the type of the field is supposed to be a string and the list of functions will be well-known. A somewhat simplified configuration example from my project:

resource {
  name = replace_matches(system_name, "prod-us1-(.*)", "$1") + "-" + to_lowercase(service_name)
  filter = tag == "production" && level(status == "NORMAL", "5 min")
}

Prefixing the functions with str:: (or perhaps regex::) will just be verbose. Namespacing will also mean that the locally defined level function (as in levelling period) will either have to have a separate namespace (perhaps agg(regaiton):: or (the project name) barsa::) or reside in the top namespace, but either way the user will have to know where to find it.

I could add functions into namespaces as per your description and introduce re-exporting functions so that my project can make the string functions available in the root namespace as well. Does that make sense?

Regarding type conversions, I have come to realize that this should rather be discussed in terms of type safety (not just separation). In particular, I think that I would like to be able to validate that the resulting type of an expression is correct when I load it. This is a separate issue from string functions, so I will do some research and open a separate issue to explain my thinking.

Regarding regex method naming, note https://docs.rs/regex/1.1.5/regex/struct.Regex.html#using-the-stdstrpattern-methods-with-regex . Some digging points to rust-lang/rust#27721 (#![feature(pattern)]) and rust-lang/rust#56345 (#![feature(needle)] and friends) so it will be there eventually, I suppose. I'm fine with implementing regex::{matches,replace} or str::regex_{matches,replace} if you prefer to "track" stable Rust in this regard.

The rest of the comments are straight-forward changes that I will incorporate when I update the code pending resolution of these discussions.

@ISibboI
Copy link
Owner

ISibboI commented Apr 12, 2019

Hey bittrance,

I understand your concern about namespaces being too verbose for your use case. I think we can easily resolve that by adding something like an include_all_namespaces flag to the Context. If it is set, having an expression x would evaluate to the value of x, or abc::x or some other namespace.
In case of conflicts, an error should be returned.

For the sake of moving this forward, we could just merge the pull request without the namespacing changes, so you name all your methods without str:: prefixes, and I introduce that later on.
This leads to another breaking change, but I think that's fine.

About type conversions, I think we can also merge this pull request without the auto-conversion feature into strings. Can all be added separately. I think we should not overload this pull request.

For the regex naming, I think there should be a differentiation between calling for example the matches method with a regex, or with a simple string pattern. Since string patterns might also contain control-characters of regexes, which might then be confusing to the user. I think I would rather have separate methods for regex/non-regex operations. So for now I guess regex_{matches,replace}.

@bittrance
Copy link
Contributor Author

I went with your original suggestion to namespace with str:: as it is going to be a while until this becomes a problem for me and I'm confident we can find a good solution in a separate PR.

I hid the parts dependent on regex behind a feature flagn (excluding the error variant and its support which are not dependent on anything from the actual crate). Not sure feature flagging individual match arms is best practice, but I think that problem should be solved in the more general context of namespaces and modularized builtins.

I force-pushed the branch in order to make the regex dependency optional in all commits so as to not break future bisection.

@ISibboI
Copy link
Owner

ISibboI commented Apr 13, 2019

Okay that sounds good.

For feature flagging individual match arms: I'm fine with that. Right now there are so little builtins that this list is easy to comprehend.

@ISibboI ISibboI merged commit 2f7d1c2 into ISibboI:master Apr 13, 2019
@ISibboI
Copy link
Owner

ISibboI commented Apr 13, 2019

Thank you for your contribution! You seem to write really good code.

ISibboI added a commit that referenced this pull request Apr 13, 2019
@ISibboI ISibboI self-assigned this May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature:regex Related to the optional regex feature of this crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants