-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-16825: Generate v2 API SolrRequest bindings #1793
SOLR-16825: Generate v2 API SolrRequest bindings #1793
Conversation
This commit moves several "model" classes (SolrJerseyResponse, ErrorInfo, etc.) to the new "solrj-model" module. With these in place, this commit also extracts an interface from a sample API chosen mostly at random (DeleteAliasAPI -> IDeleteAlias). We should be able to generate our OpenAPI spec just from this interface and the corresponding models.
Mostly involves gradle changes, but also involves a few Java changes as OpenAPI metadata annotations need moved off of JerseyApplications to elsewhere.
Now that solrj-model is producing an OpenAPI spec, solrj can rely on this as an "artifact" and use it to generate Java code that would live as a part of SolrJ. The trickiest part of this change is making sure that solrj-model properly advertises openapi.json as an 'artifact', and ensuring that solrj knows to generate Java code from it prior to compilation.
By default, openapi-generator produces a complete client, including integration with HTTP libraries, utility classes, documentation, etc. We want very little of this for our purposes, since we're just looking to auto-generate SolrRequest implementations for now. The 'globalProperties' setting allows us to lock down the files that the generator produces, which really lowers the noise in the generated code.
We need to make changes to the template for it to look the way we need it to. This commit tweaks the generation to rely on a template override directory, 'src/resources/java-template'. This will probably need to move eventually - I don't want this included in the solrj jar after all. But it's fine for a POC. This commit doesn't actually make any changes to the api.mustache file from the default that ships in the generator. But it sets the stage for us to make changes in subsequent commits. In terms of the specific changes, there are two directions we can go in: 1. Generate request objects that match the current SolrRequest contract. This requires the fewest changes to SolrJ as it exists, but requires our generated code to work with NamedList even though we have strongly typed classes at our disposal. 2. Generate request objects that fulfill a SolrRequest contract modified in some way to take advantage of the strongly typed classes we have available to us. IMO this would be cleaner overall, but would probably require changes to SolrClient, because of how the SolrRequest contract is used by SolrClient internally to make requests. I'll start by going the first route and seeing how that turns out.
Modifying the generation template might be a little easier if we have two data points (i.e. APIs) to look at. To that end, this commit moves models and an interface for DeleteCollectionAPI to solrj-model.
Core shouldn't depend on swagger-annotations longterm, but since this is a POC and we've only moved 2 APIs over to solrj-model, there are still several APIs in 'core' that use these annotations and need the dep for now.
The method annotations on DeleteCollectionAPI were preventing us from inheriting the annotations from IDeleteCollection, causing the API to not be picked up by Jersey at runtime! Removing the annotations from DCAPI fixes this problem.
With this commit we're able to generate SolrRequest implementations for each API in our OAS (OpenAPI Spec). They work, though there are still some limitations: - no support yet for POST/PUT requests that have request-body parameters. Only 'path' and 'query' parameters are supported currently. - no real SolrResponse generation, currently reusing SimpleSolrResponse. - no Javadocs on generated SolrRequest implementations.
There's room for improvement as always, but it's better than nothing.
Not sure this approach will hold up beyond the POC, given past pushback to having SolrJ dependent on Jackson, but if nothing else it'll serve as a stepping stone for now.
This commit adds two things: 1. Generation of request and response POJO classes by openapi-generator. In an ideal world we'd reuse the POJOs that already exist in solrj-model, but I ran into a few barriers when I tried to have generated SolrRequest classes use the things that already existed in solrj-model. Hopefully that's resolved later and we can aboid generating POJO classes entirely, but I've added generation of them as a temporary workaround. 2. Now that we have model classes available to us, this commit also modifies the generated SolrRequest implementations to return a strongly-typed SolrResponse that relies on Jackson for parsing. Again, if there's strong pushback on using Jackson in SolrJ then we'll need to come up with replacement code for this. But it's a solid approach for now. Next up, and last for this POC, is the ability to handle request bodies in our generated SolrRequest objects.
Moving an API that takes a request body over to the 'models' module is necessary to start templating out how our generated requests assemble request bodies.
SolrJ supports API calls containing entities (i.e. request bodies) via the SolrRequest.getContentWriter method. This commit adds templating so that: 1. The values present accepted in request bodies can now be provided and stored by each SolrRequest implementation. 2. The generated SolrRequest implementations have a getContentWriter method which returns a special ContentWriter that serializes the request body using Jackson. Obviously we're relying on Jackson again here, but for a POC this is probably fine until there's feedback from the community one way or another on the "Jackson" issue. I've mostly tested this out with simple request bodies via JSON, but will attempt more complex request bodies soon.
That said, there's really no reason to run that command explicitly. It's a dependency-task for |
Modifies the allow/block lists in spotless.gradle to ensure that the existing 'tidy' task covers the Java code generated by solr:solrj:openApiGenerate. Also adds a "finalization" hook to 'openApiGenerate' so that tidy is run on the generated code straightaway. (Currently this is done in a not-quite-ideal way, as we also tidy up non-generated SolrJ code. That's something I'm hoping to limit/remove in a subsequent commit.)
This commit creates spotless tasks specific to the generated files in SolrJ, so that we can tidy those up immediately following generation without impacting other files in SolrJ. (Previously the broader 'tidy' task was invoked following file-generation, causing _all_ files in SolrJ to be reformatted.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work Jason!
I'm not sure I love a new module for this but I guess it's okay. If it's strictly required by SolrJ then perhaps we can combine the classes into the SolrJ JAR?
- Might "Api" be a better name for this module? "Model" seems abstract.
- Putting "I" before the interfaces is something I normally kinda hate but if it's only a practice to be used here then I can accept it but not like it. Maybe "Api" prefix or suffix or none at all?
- Jackson: dependency -- I can see there is a huge maintenance win here so I'm good with making it required.
Would this come to Solr 9 albeit in an opt-in module users would try (not required by SolrJ), then embrace it in Solr 10?
solr/solrj/build.gradle
Outdated
generateModelTests = false | ||
} | ||
|
||
tasks.getByName("copySpec").dependsOn configurations.getByName("openApiSpecFile") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this in a dynamic/reflection-like way; why not explicitly do this at the point that copySpec is declared?
solr/solrj/build.gradle
Outdated
// TODO Most likely temporary, due to historical pushback on adding Jackson deps to solrj, but it does _hugely_ | ||
// simplify the parsing code and there are new compelling reasons, given the claims around CBOR performance relative | ||
// to JSON, javabin, etc. | ||
implementation 'com.fasterxml.jackson.core:jackson-databind' // Came from a testImplementation line below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was certainly pushing back in years past but I can see there is a huge code maintenance win here so I'm in favor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
In terms of how to handle this, do you think it'd be better to shade the Jackson dep (increasing solrj's size but avoiding the jackson-dep-conflict issue), or include Jackson as a "normal" (i.e. unshaded) dependency?
solr/solrj/src/java/org/apache/solr/client/solrj/JacksonContentWriter.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/java/org/apache/solr/common/util/ModelConstants.java
Outdated
Show resolved
Hide resolved
The main package is now 'o.a.s.api.', with two subpackages: 'model' (holding both request and response POJOs), and 'endpoint' (holding the API interfaces). This is not just a cosmetic change, but also something that helps address problems I was having importing certain request/response POJOs in the API class. The input to the mustache template tells us which model classes are in use, but doesn't provide the fully-qualified name (i.e. the package name is missing). Putting all models in a single package makes it easier to create a template that can import these files from the (unqualified) class name alone.
Prior to this commit, we were using openapi-generator to generate model classes that largely duplicated the POJOs in 'solrj-model'. This commit disables this unnecessary generation, and tweaks the SolrRequest template to use the 'solrj-model' classes instead.
So the part used by users, being the actual clients that is generated code, will be packaged in SolrJ core (not a separate solrj module), so it can't really be opt-out. SolrJ core also doesn't need to depend on this
I think I agree here.
I also really don't like the "I" thing. though, I'm not really sure what would be better. Maybe if we have an automated gradle check that ensures that the class starts with "I". I just want to make sure everything stays consistent. |
If I spend some time figuring out how to iterate on this PR to keep SolrJ free of this stuff (particularly Jackson) by solving the "multi-step code generation" or whatever to keep "api" separate (and with no inward dependency from SolrJ) -- would that be appreciated? If it's too ugly / awkward to maintain, we don't have to keep it. I think there's value in keeping SolrJ more focused on being a useful client for query & indexing that doesn't require Jackson... and having an "api" module on the side for doing all the other (often "admin") requests into Solr. |
What about having a SolrJ-Slim and SolrJ, the slim being the query/update interface.. Similar to our two Docker images? |
I guess I just don't see a ton of user value in making that distinction. Is this all just to save users the pain of a Jackson version conflict, or the few extra MB it'd take to shade Jackson in SolrJ? The way I see it, there aren't tons and tons of users out there who query and index, but never need to create a collection. So whatever division we make to isolate the Jackson dep will be meaningless to a majority of our users. And the few users who can take advantage of such a distinction, probably shouldn't anyway. Queries, etc. will probably be faster if they have Jackson available to use CBOR, the new transfer format Solr partially supports in 9.3. Depending how much you trust the perf tests done here), CBOR appears to be Solr's most efficient transfer format - a serious upgrade over javabin that anyone concerned about latency is going to want to try. But that's just my take 🤷
For sure! If you find a reasonably elegant way around the "multi-step code generation" thing, or find a way to avoid the extra module, I'm all for that. I've been trying to bait folks into double-checking my sketchy gradle work throughout 😛 (Thanks to Houston for all the help so far!) As I said, I'm personally skeptical of the value prop in creating a new user-facing module just for the purpose of segregating our Jackson surface area. But if there's solid consensus around that, and you're willing to look into it, I'm open to being the minority there as well. |
This reverts commit 22f7ebf.
Prior to this commit, both 'core' and 'api' were using the 'org.apache.solr.api' package. This commit changes the core package to be 'api.framework'. (I think this resolves the problems around rendering Javadoc pages?)
Moving the response POJOs into the 'api' submodule means that they can no longer declare themselves as MapWriter implementors. This means that non-Jackson codecs don't know how to serialize/deserialize these types for (e.g.) javabin responses. There were a couple options for solving this: 1. Move JacksonReflectMapWriter, MapWriter, and related code into 'api'. 2. Create a new marker interface that serves essentially the same purpose as JacksonReflectMapWriter, but living in 'api'. Update JavaBinCodec etc. to handle this new interface. 3. Flip flop once again on generating model classes. The generated versions (since they'd only be compiled by SolrJ would have access to the necessary interface types). Not sure this is the best approach, but this commit takes the easiest option, number 2.
This reverts commit 6a2dd43.
…vabin"" This reverts commit 16002db.
This second pass avoids some of the unfortunate code duplication of the first pass attempt yesterday. We're still using a marker interface in 'api' to identify the POJO classes that can be serialized using reflection, but then we use a new class, DelegateMapWriter, to serialize them in JavaBinCodec, TextWriter, etc. instead of duplicating the Utils.reflectWrite call snippet. Hopefully ends up a little cleaner and more comprehensive.
This reverts commit 1ad9fbc. The `o.a.solr.api` package in 'core' has several classes that are likely extension points for Solr package/plugin users (e.g. `Api`). Rather than renaming this package, we should find a better name for the version of the package in the 'api' submodule.
Changes the base package in 'api' to be `o.a.s.client.api` to avoid conflicting with the `o.a.s.api` package in 'core' and causing split package issues.
I think I'm not going to attempt what I proposed. |
OK, thanks for closing the loop @dsmiley. If we're OK with things as they stand, I guess I'll start cleaning this up and aim to commit sometime next week then. Once this PR lands, I'll have some follow-up work to extract API interfaces and POJOs over to the 'api' module. As more moves over to 'api', we'll have more generated SolrRequest implementations to look at, and I imagine that'll spin out into some improvements or fixes to the template as well. |
So cool to have this in!! Thanks @gerlowskija |
Prior to this commit, SolrJ had no real coverage of v2 APIs beyond the generic V2Request class. This commit introduces the first request- specific v2 SolrRequest implementations to SolrJ. Rather than duplicating details about the API here in SolrJ so that both the server-side API definition and the client objects need maintained by hand, this commit generates the v2 SolrRequest classes from the OpenAPI specification (which in turn is generated from the server-side API definition). This ensures that there's only one "source of truth" that needs maintained for each API. To achieve this, this commit introduces a new gradle module, 'api', to hold interfaces (representing Solr's APIs) and POJOs (representing the API's inputs and outputs). This module is all that is needed to generate Solr's OpenAPI spec, and for SolrJ in turn to generate SolrRequest implementations from that spec. This commit restricts v2 SolrRequest generation to a few example APIs to prove the approach. Additional APIs will be added in subsequent commits. --------- Co-authored-by: Houston Putman <houston@apache.org>
Quoting me:
I have found this to be false. Locally I deployed SolrJ to a service and forgot the "api" JAR, and there was a ClassNotFound on ReflectWritable, which is sad because it's just a marker interface. |
/** | ||
* Setup Spotless (Code formatting) for the generated java files | ||
*/ | ||
def generatedExt = new JavaExtension(spotless) | ||
project.spotlessJavaSetup.execute(generatedExt) | ||
generatedExt.target(generatedFiles) | ||
def generatedSpotlessTask = generatedExt.createIndependentApplyTask("generatedSpotless") | ||
generatedSpotlessTask.group("build") | ||
generatedSpotlessTask.description("Apply formatting for generated code") | ||
|
||
tasks.openApiGenerate.finalizedBy generatedSpotlessTask | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gerlowskija this part of build.gradle is kind of sad. An ideal Gradle file is declarative yet this code is as imperative as can be. It need not be this way! BTW I played with ChatGPT briefly to convert this to a declarative style and it wasn't perfect but it was useful.
There is something missing here:
generatedSpotlessTask.dependsOn(':checkJdkInternalsExportedToGradle')
The point of that task is to get a helpful error about a build configuration issue. Without that, you get mysterious errors about Google Java Format not being able to access certain modules of Java that have not been exported. Perhaps there's a way for that to be applied in a general way so it doesn't need to be remembered; not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An ideal Gradle file is declarative yet this code is as imperative as can be.
So this is imperative code that is creating declarative tasks. Which is something very well supported in Gradle.
If you want to share the chatGPT suggestions, I'm happy to take a look, but this was very hard to get working well in the first place, so I'd be surprised if chatGPT gives us something that works.
There is something missing here:
generatedSpotlessTask.dependsOn(':checkJdkInternalsExportedToGradle')
Yes, but ideally we just fix spotless.gradle
to not require that a spottless task be named a certain way. we can instead see all the tasks that are of type SpotlessApply
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed overall. I'm just concerned with build complexity and want to make recommendations to improve / simplify it, to include using a declarative style when possible. Assuming the createIndependentApplyTask
piece is critical, nothing much to do here :-/.
https://issues.apache.org/jira/browse/SOLR-16825
Description
SolrJ contains many SolrRequest implementations (which correspond to individual Solr APIs), but keeping them at parity and in sync with API changes made to solr-core has proved difficult and time consuming. Despite our best efforts, SolrJ lags 'core' on many APIs and lacks parity entirely on others. And that's not even considering Solr's v2 API surface, which SolrJ doesn't even attempt to cover.
We can use code-generation to solve this problem. SOLR-16346 gave us the ability to generate an "OpenAPI Spec" for Solr - a comprehensive, machine-readable description of our v2 APIs. We should start leveraging this spec, using it with one of the many OpenAPI-compatible code generators out there, to produce
SolrRequest
implementations for us "for free".Solution
The solution presented in this PR integrates with the OpenAPI Generator tooling to generate
SolrRequest
andSolrResponse
pairs for each v2 API. Some notes on the implementation:openApiGenerate
to produce the generated SolrJ classes. SolrJ'scompileJava
task now "depends on"openApiGenerate
, so this should all integrate pretty seamlessly with any IDE that loads Solr as a gradle project.Tests
Existing tests continue to pass.
I don't think we should add new tests for the generate code. (Openapi-generator does support this as an option, but the tests that it produces don't provide tons of value IMO.). Rather, I think we'd be better served modifying existing solr-solrj and solr-core tests to randomize between the v1 and v2 SolrRequest implementations, where both are available.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.