Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new selector type: regex selector #519

Open
He-Pin opened this issue Apr 30, 2024 · 14 comments
Open

Add a new selector type: regex selector #519

He-Pin opened this issue Apr 30, 2024 · 14 comments

Comments

@He-Pin
Copy link

He-Pin commented Apr 30, 2024

We have some fields in object/array , which is generated by backend, with name header_$id, I would like to select it with regex.

@gregsdennis
Copy link
Collaborator

Hey there @He-Pin. We actually have some support for that in the RFC. There are two regex functions, match() and search().

match() is implicitly anchored and will match on the full string.

search() is unanchored and will match on substrings.

Both use a flavor of regex called i-regexp, which was developed to be a compatible subset of most commonly used regex engines.

@He-Pin
Copy link
Author

He-Pin commented Apr 30, 2024

I checked that but seem will not match our usage.

{
  "data": {
    "header_1": {
      "a": "1",
      "b": "2",
      "body": "{\"c\":\"3\"}"
    },
    "header_2": {
      "a": "1",
      "b": "2",
      "body": "{\"c\":\"3\"}"
    }
  }
}

background: we want to select some json fields for translation. tried java jsonpath implementation. as the json above, we want to select the fields header_1 and header_2 first, does that supported with the current rfc?

I was using $.data[?(@.keys() =~ /header_\d+/i)] but doesn't work. so now, I'm implementation one base on the RFC and with the extended grammar:

regexSelector: /string-literal/

then I can write $[data][/header_\d+/]

as you can see, the main point here we are select on object children's property name

@gregsdennis
Copy link
Collaborator

Oh, you want the property names to be matched, not the values.

That, I think is likely going to be covered by #516, which is the piece you're missing. Once you can access the property names, you should be able to pass them into the functions.

@He-Pin
Copy link
Author

He-Pin commented Apr 30, 2024

I think we need the pointer to the child property name, maybe key() not keys().

@He-Pin
Copy link
Author

He-Pin commented May 8, 2024

@gregsdennis as #109 , I have implemented this with a new selector RegExpSelector which works on ObjectNode's properties' name.

@gregsdennis
Copy link
Collaborator

@He-Pin it's great that you've been able to implement it, but be aware that because it's not a standard behavior, it's not interoperable.

We'll leave this open as an idea for a possible JSON Path v2, but there's no such discussion at the moment. Continuing to push this idea in the short term isn't going to make that happen any faster.

@He-Pin
Copy link
Author

He-Pin commented May 9, 2024

Understand , as it's an internal needs, which should be fine.

@gregsdennis
Copy link
Collaborator

Another aspect of adding a regex selector is that there's no way to specify what kind of matching you want, which is why we have match() and search() functions rather than a simple ~= operator.

@He-Pin
Copy link
Author

He-Pin commented May 10, 2024

Yes, as it's a valid name too. but the name selector is inside '$name' but the regex selector inside a /$regex/

@He-Pin
Copy link
Author

He-Pin commented Dec 17, 2024

An update of this, we are currently using :

   * `/ $regexExp / $flags`
   * */
  private def regex[_: P]: P[Unit] = P("/" ~/ nonSlashOrEscapedSlash ~ ("/" ~ CharIn("idmsuUx").rep()))

  `regexp-selector` | `name-selector` | `wildcard-selector` | `slice-selector` | `index-selector` | `filter-selector`

I think one advantage of regexp-selector is it more lightweight than the search function, which will not require use to evaluate through the filter-expression-evaluator but still covers 80% of cases.

And there are real-world needs for this , refs: json-path/JsonPath#949

@gregsdennis
Copy link
Collaborator

gregsdennis commented Dec 17, 2024

Edit: yes I see the difference. The regex needs to apply to the key, not the value.


there are real-world needs for this

That issue is not indicative of a "need". The spec offers a solution. Yes, it's more verbose, but it also more explicitly expresses the intent of the path, which means it's more interoperable (the same path will evaluate consistently across implementations).

@gregsdennis
Copy link
Collaborator

I think this is a possibility for a potential JSON Path 2.

@He-Pin
Copy link
Author

He-Pin commented Dec 17, 2024

Yes, our current implementation is :

    private void evaluateRegExpSelector(final Node match,
                                        final Pattern pattern,
                                        final boolean isLastSegment,
                                        final boolean isDescendant,
                                        final Consumer<Node> resultNodeCollector) {
        final var node = match.currentNodeValue();
        if (node instanceof ObjectNode objectNode) {
            for (Map.Entry<String, JsonNode> member : objectNode.properties()) {
                final String key = member.getKey();
                if (pattern.matcher(key).matches()) {
                    final var value = member.getValue();
                    final var location = match.location().append(key);
                    final var newNode = newNode(objectNode, value, key, location, isLastSegment, isDescendant);
                    resultNodeCollector.accept(newNode);
                }
            }
            increaseComplexity(objectNode.size());
        }
    }

Where we test the regex with the children's property name, pattern.matcher(key).matches()

@gregsdennis
Copy link
Collaborator

As I had mentioned before, a choice will need to be made for match vs `search semantics. Or maybe a syntax that allows the user to specify which they want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants