Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17477: Support custom aggregates in plugin code #2742

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

timatbw
Copy link
Contributor

@timatbw timatbw commented Oct 4, 2024

https://issues.apache.org/jira/browse/SOLR-17477

Description

Better support for writing custom aggregates in 3rd party plugin code, outside of Solr.

Solution

Opened up visibility of certain classes that are necessary to write custom aggregates, and allow them to register by type so that they are supported just the same as field/query/range/heatmap etc

Tests

Adjusted existing test that confirms a custom aggregate can be written outside the solr facet package

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@mkhludnev
Copy link
Member

I like it. Thanks @timatbw. I'm able to merge, if there's an approve from the second 👀.

REGISTERED_TYPES.put("func", (p, k, a) -> p.parseStat(k, a));
}

public static void registerParseHandler(String type, ParseHandler parseHandler) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the https://issues.apache.org/jira/browse/SOLR-17477 comment w.r.t. how this is called!

Copy link
Member

@mkhludnev mkhludnev Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where to discuss it. But registering anything in static initializer is asking for trouble, imho. I'd rather hookup a component via solrconfig.xml (or a plugin??), which just calls registerParserHandler() for custom handlers.
Anyway it's a matter of taste, and should block this PR, unless someone propose to make FP.registerParserHandler protected. I tried, it was so pity, don't even show it to anyone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's perhaps not the best way of hooking things up, especially as the whole process is already kicked off from solrconfig.xml where you declare a valueSourceParser. I was relying on the point at which that custom parser object is instantiated to run a static initialiser to also register it for json parsing too. But maybe it's safer to invert that and have the Solr code register the custom parser if it implements json parsing. I'll have another look and think about the alternatives.

My main reason for suggesting custom code uses a static block to register is when you have many hundreds of SolrCores in your CoreContainer and each one would be instantiated and call the register method, when really you might only need it done once. But now thinking if you run a mixed workload with different solrconfig and they're not homogenous, you don't want static registering maybe (each SolrCore is distinct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the 'standard set' of registered types in the static block above probably is appropriate as those are built-in and common to all SolrCores.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REGISTERED_TYPES are shared across cores. It means:

  • one core might inject a custom parser, it appears across all cores. If it hooks up some large state it remains in heap after core unloading (it might be called as a leak)
  • registerPH might be called with one of the standard name bringing some surprises to other cores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes all good points, I'm going to change this to remove the statics and make it per-core which probably fits better with how it's hooked in too

@cpoerschke cpoerschke requested a review from mkhludnev October 4, 2024 16:04
Copy link
Member

@mkhludnev mkhludnev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please proceed with access modifiers

@@ -138,21 +139,30 @@ public Object parseFacetOrStat(String key, Object o) throws SyntaxError {
return parseFacetOrStat(key, type, args);
}

public interface ParseHandler {
Object doParse(FacetParser<?> parent, String key, Object args) throws SyntaxError;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, is there an other exemplar of such a naming convention like doSometh ? why not just parse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular strong feelings on this, I tend to use doSomething when it's an inner helper method called from an outer/wrapper method that is the something(..) but in this case it's not really directly called. In practice I think people would use lambda anyway and never actually write this method name. But happy to change it to parse!

@@ -138,21 +139,30 @@ public Object parseFacetOrStat(String key, Object o) throws SyntaxError {
return parseFacetOrStat(key, type, args);
}

public interface ParseHandler {
Object doParse(FacetParser<?> parent, String key, Object args) throws SyntaxError;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, return type is a little bit tricky FacetRequest|AggValueSource I suppose it deserves to be documented via Javadoc, unless we can declare it explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already how it is, the calling code allows the parser to return either a FacetRequest or a AggValueSource hence that Object type was needed here too. It's not ideal, and perhaps in practice most people will want to produce a AggValueSource only. Creating new subclasses of FacetRequest outside the package is harder, and I've not yet looked at how feasible it is to do that from a plugin (it's not the focus of this issue). But it would be interesting if eventually you can define custom types of faceting too.

@mkhludnev
Copy link
Member

oh gosh.. I figure it out how to update PR as a maintainer. The problem was I did it by https auth from Idea, but it seems it requested token only for my account and can't push into someone else's even if Maintainers are allowed to edit this pull request. is turn on. Old good ssh did a thing.

@timatbw
Copy link
Contributor Author

timatbw commented Oct 28, 2024

I've just realised the registration part of this work is not necessary! All that's really needed is the public modifier changes. I'm going to simplify the PR down to just that bit.

The reason the registration isn't needed is because you can already achieve the desired results using the existing mechanism type:func which lets you access the structured json args via local params. So a ValueSourceParser you might have in your plugin code can be configured this way by pulling values from the local params accessible in the FunctionQParser object, instead of positional arguments. All you do is declare your json facet like so

{ mycalculation:{ type:func, func:myaggregate, myfield:something, mylimit:2 } }

then all those key-value params are available inside your implementation of ValueSourceParser

    @Override
    public ValueSource parse(FunctionQParser fp) throws SyntaxError {
        if (fp.getLocalParams() != null) {
            String fieldName = fp.getLocalParams().get("myfield");
            Integer limit = fp.getLocalParams().getInt("mylimit");
            return new MyAgg(fieldName, limit);
        } else {
            return new MyAgg(fp.parseArg(), fp.parseInt());
        }
    }

@timatbw
Copy link
Contributor Author

timatbw commented Nov 1, 2024

I'm going to (force) push my local branch to this PR again, which will overwrite/replace the commits you've added on top @mkhludnev just so you know. But I think the previous approach we were working on isn't necessary so I'm simplifying this PR a lot

@timatbw timatbw force-pushed the solr-17477-custom-aggs branch from 0e2d148 to 0502e60 Compare November 1, 2024 12:15
@mkhludnev
Copy link
Member

mkhludnev commented Nov 5, 2024

you can already achieve the desired results using the existing mechanism type:func

I'm made a simple experiment. this time, you won't drop my commit Mind the file location, I wish to build in a code snippet into RefGuide! Meanwhile, strawman approach with funcs does not let custom parser to read a nested map (ok it let it read,but only stringified). Also, custom parser has no access to a parent facet parser, it might be necessary for parsing nested facets, but maybe FacetParser might be instantiated right there, haven't check it yet.
mkhludnev@85870c4#diff-01a34eb1905d3f728ae9cb3d433945541b5a69fbe02e3f8da21ccd38ebe859edR148
The q no. 1 are we ok with the scope limited by type:func hook, in this case, are facet parser changes necessary?

@timatbw
Copy link
Contributor Author

timatbw commented Nov 5, 2024

Good points, yes I did notice that the parent parser would not be available via the simple localParams approach, but I didn't realise that SolrParams only support String values not arbitrary nested objects (that's a shame).

For my use-cases, I'm OK with just the String-typed map and no need for the parent parser, but I guess a more general solution could support those as well.

The solution I had been working on before I found type:func was to add a 2nd AggValueSource parse(FacetParser parent, Object args) method to ValueSourceParser (with a default returning null), and then in FacetParser after the switch cases, do a getSolrRequest().getCore().getValueSourceParser(type) and if there's one, call the new method and use that as the return value. That ended up being fairly minimal changes.

@mkhludnev
Copy link
Member

I'm not sure it may work, since ValueSourceParser is invoked by FuncQParser with fixed interface.

@mkhludnev
Copy link
Member

Here's my idea:

  • What's already done is fine and really needed, let's just add a refguide section with a good example and go ahead with piggyback on AggValueSourceParser.

Then, in another PR we may choose:

  • plan B.1: introduce a new entity: FacetParser in solrconfig.xml and /config api. That's doable, unless vetoed.
  • plan B.2: introduce factory methods in FacetModule instead of constructors' calls for all parsers and facets. And maybe even extract them into a super factory for FacetModule. Then demo how users may to override the standard FacetModule by own descendant extending a parsing routine. It let us to stick with existing extensions/config system.

@timatbw, wdyt?

@timatbw
Copy link
Contributor Author

timatbw commented Nov 12, 2024

Yes that sounds good, as I feel we might be trying to expand the scope of this change too much for 1 PR. I agree, making the public modifiers change is needed, then we can document how to do a typical simple use-case of a custom aggregate in plugin code using only String values from json. That's enough for this PR. I'll work on that next, although I don't know much about the RefGuide as I've not written for it before, will look.

And yes, next steps of work for another PR would be to add another method to ValueSourceParser and call it from FacetParser, so we fully support parent facets and non-String complex json config. I can produce a 2nd PR with my proposed diff for that change.

And further steps are your Plan B, with complete ability to override or replace the FacetModule mechanism, and define new kinds of faceting totally different to the Facet Field/Range/Query etc.

@mkhludnev
Copy link
Member

you can already achieve the desired results using the existing mechanism type:func

Wait a sec. I read here https://solr.apache.org/guide/solr/latest/query-guide/json-facet-api.html#stat-facet-functions

Unlike all the facets discussed so far, Aggregation functions (also called facet functions, analytic functions, or metrics) do not partition data into buckets. Instead, they calculate something over all the documents in the domain.

Can the proposed approach with type:func yield multiple buckets? What we want to achieve here?

@timatbw
Copy link
Contributor Author

timatbw commented Nov 15, 2024

Correct, the work in this Jira+PR is only about custom metrics i.e. calculating a value for each bucket. It's not about creating custom faceting types for making buckets in the first place. That would be interesting too, but is quite a different bit of work because the FacetModule is not currently set up to do that (a lot more classes would need to be open for extension).

@mkhludnev
Copy link
Member

creating custom faceting types for making buckets in the first place. That would be interesting too, but is quite a different bit of work because the FacetModule is not currently set up to do that (a lot more classes would need to be open for extension).

Just got some fun of it #2865 I don't recommend anyone to look at it. It's so scary.

@mkhludnev
Copy link
Member

I don't know much about the RefGuide as I've not written for it before, will look.

made a short clue mkhludnev@c1c1a3e

@mkhludnev
Copy link
Member

PR is only about custom metrics i.e. calculating a value for each bucket. It's not about creating custom faceting types for making buckets in the first place.

does it mean FacetParser change is not necessary?

@timatbw
Copy link
Contributor Author

timatbw commented Nov 21, 2024

does it mean FacetParser change is not necessary?

The PR now only changes that class to make it public but that was probably only needed in the original changes around registering parsers, so yes it may not need to be changed at all now.

@mkhludnev
Copy link
Member

creating custom faceting types for making buckets in the first place. That would be interesting too, but is quite a different bit of work because the FacetModule is not currently set up to do that (a lot more classes would need to be open for extension).

link to the relevant discussion https://lists.apache.org/thread/bk15vv82xtgjslvpp5ff3nctop7qcy6j

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants