Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16825: Generate v2 API SolrRequest bindings #1793

Merged
merged 60 commits into from
Aug 8, 2023

Conversation

gerlowskija
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-16825

Description

SolrJ contains many SolrRequest implementations (which correspond to individual Solr APIs), but keeping them at parity and in sync with API changes made to solr-core has proved difficult and time consuming. Despite our best efforts, SolrJ lags 'core' on many APIs and lacks parity entirely on others. And that's not even considering Solr's v2 API surface, which SolrJ doesn't even attempt to cover.

We can use code-generation to solve this problem. SOLR-16346 gave us the ability to generate an "OpenAPI Spec" for Solr - a comprehensive, machine-readable description of our v2 APIs. We should start leveraging this spec, using it with one of the many OpenAPI-compatible code generators out there, to produce SolrRequest implementations for us "for free".

Solution

The solution presented in this PR integrates with the OpenAPI Generator tooling to generate SolrRequest and SolrResponse pairs for each v2 API. Some notes on the implementation:

  1. Our OpenAPI Spec ("OAS") has historically been produced by "solr-core", since that's where the API code itself lived. This was a problem for generating SolrJ code unfortunately: "core" depends on "solrj", so the OpenAPI spec wasn't even available until after the solrj build had already finished! To address this, this PR proposes we move our OpenAPI integration to a new gradle module called 'solrj-model'. This new gradle module holds interfaces and request/response POJOs needed to represent our APIs and generate an OpenAPI spec.
  2. 'solrj' depends on the new 'solrj-model' dependency and uses a new task, openApiGenerate to produce the generated SolrJ classes. SolrJ's compileJava task now "depends on" openApiGenerate, so this should all integrate pretty seamlessly with any IDE that loads Solr as a gradle project.
  3. Currently the generated SolrJ classes use jackson-databind to serialize requests to Solr and deserialize responses. This avoids the NamedList-inspection prevalent elsewhere in SolrJ. I know we've shied away from a direct Jackson dependency in the past, but I think the idea is worth considering given the huge amount Jackson would be getting us "for free" with this approach. If there aren't strong objections, I'll plan on shading the Jackson dependency so users wouldn't have to worry about version conflicts with their own code.
  4. This PR only adds a few v2 API interfaces to 'solrj-model', as a way to control scope and exhibit the approach without bloating the PR. Once the main idea has been proven, subsequent PRs can add in more APIs.
  5. The generated code is NOT committed to version control, as that appears to be an anti-pattern. (At least, from the poking around I did online).

Tests

Existing tests continue to pass.

I don't think we should add new tests for the generate code. (Openapi-generator does support this as an option, but the tests that it produces don't provide tons of value IMO.). Rather, I think we'd be better served modifying existing solr-solrj and solr-core tests to randomize between the v1 and v2 SolrRequest implementations, where both are available.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

This commit moves several "model" classes (SolrJerseyResponse,
ErrorInfo, etc.) to the new "solrj-model" module.  With these in place,
this commit also extracts an interface from a sample API chosen mostly
at random (DeleteAliasAPI -> IDeleteAlias).  We should be able to
generate our OpenAPI spec just from this interface and the corresponding
models.
Mostly involves gradle changes, but also involves a few Java changes as
OpenAPI metadata annotations need moved off of JerseyApplications to
elsewhere.
Now that solrj-model is producing an OpenAPI spec, solrj can rely on
this as an "artifact" and use it to generate Java code that would live
as a part of SolrJ.

The trickiest part of this change is making sure that solrj-model
properly advertises openapi.json as an 'artifact', and ensuring that
solrj knows to generate Java code from it prior to compilation.
By default, openapi-generator produces a complete client, including
integration with HTTP libraries, utility classes, documentation, etc.

We want very little of this for our purposes, since we're just looking
to auto-generate SolrRequest implementations for now.  The
'globalProperties' setting allows us to lock down the files that the
generator produces, which really lowers the noise in the generated code.
We need to make changes to the template for it to look the way we need
it to.  This commit tweaks the generation to rely on a template override
directory, 'src/resources/java-template'.

This will probably need to move eventually - I don't want this included
in the solrj jar after all.  But it's fine for a POC.

This commit doesn't actually make any changes to the api.mustache file
from the default that ships in the generator.  But it sets the stage for
us to make changes in subsequent commits.

In terms of the specific changes, there are two directions we can go in:

  1. Generate request objects that match the current SolrRequest
     contract.  This requires the fewest changes to SolrJ as it exists,
     but requires our generated code to work with NamedList even though
     we have strongly typed classes at our disposal.
  2. Generate request objects that fulfill a SolrRequest contract
     modified in some way to take advantage of the strongly typed
     classes we have available to us.  IMO this would be cleaner
     overall, but would probably require changes to SolrClient, because
     of how the SolrRequest contract is used by SolrClient internally to
     make requests.

I'll start by going the first route and seeing how that turns out.
Modifying the generation template might be a little easier if we have
two data points (i.e. APIs) to look at.  To that end, this commit moves
models and an interface for DeleteCollectionAPI to solrj-model.
Core shouldn't depend on swagger-annotations longterm, but since this is
a POC and we've only moved 2 APIs over to solrj-model, there are still
several APIs in 'core' that use these annotations and need the dep for
now.
The method annotations on DeleteCollectionAPI were preventing us from
inheriting the annotations from IDeleteCollection, causing the API to
not be picked up by Jersey at runtime!

Removing the annotations from DCAPI fixes this problem.
With this commit we're able to generate SolrRequest implementations for
each API in our OAS (OpenAPI Spec).  They work, though there are still
some limitations:

  - no support yet for POST/PUT requests that have request-body
    parameters.  Only 'path' and 'query' parameters are supported
    currently.
  - no real SolrResponse generation, currently reusing
    SimpleSolrResponse.
  - no Javadocs on generated SolrRequest implementations.
There's room for improvement as always, but it's better than nothing.
Not sure this approach will hold up beyond the POC, given past pushback
to having SolrJ dependent on Jackson, but if nothing else it'll serve as
a stepping stone for now.
This commit adds two things:

  1. Generation of request and response POJO classes by
     openapi-generator.  In an ideal world we'd reuse the POJOs that
     already exist in solrj-model, but I ran into a few barriers when I
     tried to have generated SolrRequest classes use the things that
     already existed in solrj-model.  Hopefully that's resolved later
     and we can aboid generating POJO classes entirely, but I've added
     generation of them as a temporary workaround.
  2. Now that we have model classes available to us, this commit also
     modifies the generated SolrRequest implementations to return a
     strongly-typed SolrResponse that relies on Jackson for parsing.
     Again, if there's strong pushback on using Jackson in SolrJ then
     we'll need to come up with replacement code for this.  But it's a
     solid approach for now.

Next up, and last for this POC, is the ability to handle request bodies
in our generated SolrRequest objects.
Moving an API that takes a request body over to the 'models' module
is necessary to start templating out how our generated requests assemble
request bodies.
SolrJ supports API calls containing entities (i.e. request bodies) via
the SolrRequest.getContentWriter method.  This commit adds templating so
that:
  1. The values present accepted in request bodies can now be provided
     and stored by each SolrRequest implementation.
  2. The generated SolrRequest implementations have a getContentWriter
     method which returns a special ContentWriter that serializes the
     request body using Jackson.

Obviously we're relying on Jackson again here, but for a POC this is
probably fine until there's feedback from the community one way or
another on the "Jackson" issue.

I've mostly tested this out with simple request bodies via JSON, but
will attempt more complex request bodies soon.
@gerlowskija
Copy link
Contributor Author

./gradlew clean solr:solrj:openApiGenerate generates the v2 SolrRequest and SolrResponse objects and puts them in solr/solrj/build/generated.

That said, there's really no reason to run that command explicitly. It's a dependency-task for compileJava so it'll get triggered by an IDE or other build.

gerlowskija and others added 5 commits July 19, 2023 08:43
Modifies the allow/block lists in spotless.gradle to ensure that the
existing 'tidy' task covers the Java code generated by
solr:solrj:openApiGenerate.

Also adds a "finalization" hook to 'openApiGenerate' so that tidy is
run on the generated code straightaway.  (Currently this is done in a
not-quite-ideal way, as we also tidy up non-generated SolrJ code.
That's something I'm hoping to limit/remove in a subsequent commit.)
This commit creates spotless tasks specific to the generated files in
SolrJ, so that we can tidy those up immediately following generation
without impacting other files in SolrJ.

(Previously the broader 'tidy' task was invoked following
file-generation, causing _all_ files in SolrJ to be reformatted.)
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work Jason!

I'm not sure I love a new module for this but I guess it's okay. If it's strictly required by SolrJ then perhaps we can combine the classes into the SolrJ JAR?

  • Might "Api" be a better name for this module? "Model" seems abstract.
  • Putting "I" before the interfaces is something I normally kinda hate but if it's only a practice to be used here then I can accept it but not like it. Maybe "Api" prefix or suffix or none at all?
  • Jackson: dependency -- I can see there is a huge maintenance win here so I'm good with making it required.

Would this come to Solr 9 albeit in an opt-in module users would try (not required by SolrJ), then embrace it in Solr 10?

generateModelTests = false
}

tasks.getByName("copySpec").dependsOn configurations.getByName("openApiSpecFile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this in a dynamic/reflection-like way; why not explicitly do this at the point that copySpec is declared?

Comment on lines 88 to 91
// TODO Most likely temporary, due to historical pushback on adding Jackson deps to solrj, but it does _hugely_
// simplify the parsing code and there are new compelling reasons, given the claims around CBOR performance relative
// to JSON, javabin, etc.
implementation 'com.fasterxml.jackson.core:jackson-databind' // Came from a testImplementation line below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was certainly pushing back in years past but I can see there is a huge code maintenance win here so I'm in favor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

In terms of how to handle this, do you think it'd be better to shade the Jackson dep (increasing solrj's size but avoiding the jackson-dep-conflict issue), or include Jackson as a "normal" (i.e. unshaded) dependency?

solr/solrj/src/resources/java-template/pojo.mustache Outdated Show resolved Hide resolved
The main package is now 'o.a.s.api.', with two subpackages: 'model'
(holding both request and response POJOs), and 'endpoint' (holding the
API interfaces).

This is not just a cosmetic change, but also something that helps
address problems I was having importing certain request/response POJOs
in the API class.  The input to the mustache template tells us which
model classes are in use, but doesn't provide the fully-qualified name
(i.e. the package name is missing).  Putting all models in a single
package makes it easier to create a template that can import these files
from the (unqualified) class name alone.
Prior to this commit, we were using openapi-generator to generate model
classes that largely duplicated the POJOs in 'solrj-model'.  This commit
disables this unnecessary generation, and tweaks the SolrRequest
template to use the 'solrj-model' classes instead.
@HoustonPutman
Copy link
Contributor

Would this come to Solr 9 albeit in an opt-in module users would try (not required by SolrJ), then embrace it in Solr 10?

So the part used by users, being the actual clients that is generated code, will be packaged in SolrJ core (not a separate solrj module), so it can't really be opt-out. SolrJ core also doesn't need to depend on this solrj-model or solrj-api module, so that doesn't need to be opt-in or opt-out, only solr-core (and possibly Solr Modules) would depend on this new SolrJ module.

Might "Api" be a better name for this module? "Model" seems abstract.

I think I agree here.

Putting "I" before the interfaces is something I normally kinda hate but if it's only a practice to be used here then I can accept it but not like it. Maybe "Api" prefix or suffix or none at all?

I also really don't like the "I" thing. though, I'm not really sure what would be better. Maybe if we have an automated gradle check that ensures that the class starts with "I". I just want to make sure everything stays consistent.

@dsmiley
Copy link
Contributor

dsmiley commented Jul 28, 2023

If I spend some time figuring out how to iterate on this PR to keep SolrJ free of this stuff (particularly Jackson) by solving the "multi-step code generation" or whatever to keep "api" separate (and with no inward dependency from SolrJ) -- would that be appreciated? If it's too ugly / awkward to maintain, we don't have to keep it. I think there's value in keeping SolrJ more focused on being a useful client for query & indexing that doesn't require Jackson... and having an "api" module on the side for doing all the other (often "admin") requests into Solr.

@epugh
Copy link
Contributor

epugh commented Jul 28, 2023

What about having a SolrJ-Slim and SolrJ, the slim being the query/update interface.. Similar to our two Docker images?

@gerlowskija
Copy link
Contributor Author

I think there's value in keeping SolrJ more focused on being a useful client for query & indexing that doesn't require Jackson... and having an "api" module on the side for doing all the other (often "admin") requests into Solr.

I guess I just don't see a ton of user value in making that distinction. Is this all just to save users the pain of a Jackson version conflict, or the few extra MB it'd take to shade Jackson in SolrJ?

The way I see it, there aren't tons and tons of users out there who query and index, but never need to create a collection. So whatever division we make to isolate the Jackson dep will be meaningless to a majority of our users.

And the few users who can take advantage of such a distinction, probably shouldn't anyway. Queries, etc. will probably be faster if they have Jackson available to use CBOR, the new transfer format Solr partially supports in 9.3. Depending how much you trust the perf tests done here), CBOR appears to be Solr's most efficient transfer format - a serious upgrade over javabin that anyone concerned about latency is going to want to try.

But that's just my take 🤷

If I spend some time figuring out how to iterate on this PR to keep SolrJ free of this stuff (particularly Jackson) by solving the "multi-step code generation" or whatever to keep "api" separate (and with no inward dependency from SolrJ) -- would that be appreciated?

For sure! If you find a reasonably elegant way around the "multi-step code generation" thing, or find a way to avoid the extra module, I'm all for that. I've been trying to bait folks into double-checking my sketchy gradle work throughout 😛 (Thanks to Houston for all the help so far!)

As I said, I'm personally skeptical of the value prop in creating a new user-facing module just for the purpose of segregating our Jackson surface area. But if there's solid consensus around that, and you're willing to look into it, I'm open to being the minority there as well.

Prior to this commit, both 'core' and 'api' were using the
'org.apache.solr.api' package.  This commit changes the core package to
be 'api.framework'.  (I think this resolves the problems around
rendering Javadoc pages?)
Moving the response POJOs into the 'api' submodule means that they can
no longer declare themselves as MapWriter implementors.  This means that
non-Jackson codecs don't know how to serialize/deserialize these types
for (e.g.) javabin responses.

There were a couple options for solving this:
  1. Move JacksonReflectMapWriter, MapWriter, and related code into
     'api'.
  2. Create a new marker interface that serves essentially the same
     purpose as JacksonReflectMapWriter, but living in 'api'.  Update
     JavaBinCodec etc. to handle this new interface.
  3. Flip flop once again on generating model classes.  The generated
     versions (since they'd only be compiled by SolrJ would have access
     to the necessary interface types).

Not sure this is the best approach, but this commit takes the easiest
option, number 2.
This second pass avoids some of the unfortunate code duplication of the
first pass attempt yesterday.  We're still using a marker interface in
'api' to identify the POJO classes that can be serialized using
reflection, but then we use a new class, DelegateMapWriter, to
serialize them in JavaBinCodec, TextWriter, etc. instead of duplicating
the Utils.reflectWrite call snippet.

Hopefully ends up a little cleaner and more comprehensive.
This reverts commit 1ad9fbc.

The `o.a.solr.api` package in 'core' has several classes that are likely
extension points for Solr package/plugin users (e.g. `Api`).  Rather
than renaming this package, we should find a better name for the version
of the package in the 'api' submodule.
Changes the base package in 'api' to be `o.a.s.client.api` to avoid
conflicting with the `o.a.s.api` package in 'core' and causing split
package issues.
@dsmiley
Copy link
Contributor

dsmiley commented Aug 1, 2023

I think I'm not going to attempt what I proposed.
At least the use of Jackson is optional. Which is to say, if there is a client that is just doing index/query then it need not include our new "api" module either. We still have our current hand-written APIs for typical administrative things too. I could imagine 10.0 embracing the new API & Jackson but what I said (about Jackson & "api" being kind of optional) is for 9.x. I don't think we should bother shading Jackson unless we get input to the contrary. If we don't use tons of Jackson's surface area, might as well just expose the dependency and let the user choose the version they want, probably a future version that doesn't even exist right now.

@gerlowskija
Copy link
Contributor Author

OK, thanks for closing the loop @dsmiley. If we're OK with things as they stand, I guess I'll start cleaning this up and aim to commit sometime next week then.

Once this PR lands, I'll have some follow-up work to extract API interfaces and POJOs over to the 'api' module. As more moves over to 'api', we'll have more generated SolrRequest implementations to look at, and I imagine that'll spin out into some improvements or fixes to the template as well.

@gerlowskija gerlowskija merged commit 326753c into apache:main Aug 8, 2023
2 of 3 checks passed
@gerlowskija gerlowskija deleted the SOLR-16825-java-binding-poc branch August 8, 2023 18:14
@HoustonPutman
Copy link
Contributor

So cool to have this in!! Thanks @gerlowskija

gerlowskija added a commit that referenced this pull request Aug 18, 2023
Prior to this commit, SolrJ had no real coverage of v2 APIs beyond the
generic V2Request class.  This commit introduces the first request-
specific v2 SolrRequest implementations to SolrJ.

Rather than duplicating details about the API here in SolrJ so that both
the server-side API definition and the client objects need maintained
by hand, this commit generates the v2 SolrRequest classes from the
OpenAPI specification (which in turn is generated from the server-side
API definition).  This ensures that there's only one "source of truth"
that needs maintained for each API.

To achieve this, this commit introduces a new gradle module, 'api', to
hold interfaces (representing Solr's APIs) and POJOs (representing the
API's inputs and outputs).  This module is all that is needed to generate
Solr's OpenAPI spec, and for SolrJ in turn to generate SolrRequest
implementations from that spec.

This commit restricts v2 SolrRequest generation to a few example APIs
to prove the approach.  Additional APIs will be added in subsequent
commits.

---------

Co-authored-by: Houston Putman <houston@apache.org>
@dsmiley
Copy link
Contributor

dsmiley commented Apr 27, 2024

Quoting me:

At least the use of Jackson is optional. Which is to say, if there is a client that is just doing index/query then it need not include our new "api" module either

I have found this to be false. Locally I deployed SolrJ to a service and forgot the "api" JAR, and there was a ClassNotFound on ReflectWritable, which is sad because it's just a marker interface.

Comment on lines +144 to +155
/**
* Setup Spotless (Code formatting) for the generated java files
*/
def generatedExt = new JavaExtension(spotless)
project.spotlessJavaSetup.execute(generatedExt)
generatedExt.target(generatedFiles)
def generatedSpotlessTask = generatedExt.createIndependentApplyTask("generatedSpotless")
generatedSpotlessTask.group("build")
generatedSpotlessTask.description("Apply formatting for generated code")

tasks.openApiGenerate.finalizedBy generatedSpotlessTask

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerlowskija this part of build.gradle is kind of sad. An ideal Gradle file is declarative yet this code is as imperative as can be. It need not be this way! BTW I played with ChatGPT briefly to convert this to a declarative style and it wasn't perfect but it was useful.

There is something missing here:
generatedSpotlessTask.dependsOn(':checkJdkInternalsExportedToGradle')

The point of that task is to get a helpful error about a build configuration issue. Without that, you get mysterious errors about Google Java Format not being able to access certain modules of Java that have not been exported. Perhaps there's a way for that to be applied in a general way so it doesn't need to be remembered; not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ideal Gradle file is declarative yet this code is as imperative as can be.

So this is imperative code that is creating declarative tasks. Which is something very well supported in Gradle.
If you want to share the chatGPT suggestions, I'm happy to take a look, but this was very hard to get working well in the first place, so I'd be surprised if chatGPT gives us something that works.

There is something missing here:
generatedSpotlessTask.dependsOn(':checkJdkInternalsExportedToGradle')

Yes, but ideally we just fix spotless.gradle to not require that a spottless task be named a certain way. we can instead see all the tasks that are of type SpotlessApply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed overall. I'm just concerned with build complexity and want to make recommendations to improve / simplify it, to include using a declarative style when possible. Assuming the createIndependentApplyTask piece is critical, nothing much to do here :-/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants