-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Language to Jakarta Data Proposal [Vote] #458
Comments
Excellent, this is definitely going in the direction I have been hoping! I have some feedback on detail, but I'll keep that to myself for now, so as not to poison the well while others look over what you've proposed. Looking forward to discussing it on the call tomorrow. |
If we are going to add standardized query language in version 1.0, I would recommend that it be defined as a subset of JPQL from Jakarta Persistence (whichever subset is sufficiently common with NoSQL). JPQL is a well established standard and would make sense to reuse from it, otherwise I think it would be too much to get right in version 1.0. Whatever we define, we will need to remain compatible with going forward to avoid breaking changes. I don't think that would be a concern with a JPQL subset, and it looks like that would mostly align with the capability that you have in your example BNF. |
I am fine, I just propose something to start. If it makes sense we can go further with it. |
OK, just to give some specific feedback on the proposal here, after our discussion today:
|
So, @njr-11 suggested starting from a cut-down version of JPQL. Here's what that might look like. Note that I have diverged from JPA in at least two respects:
This is going to look a bit scary, because the JPQL grammar is defined in a very verbose way, mixing in typing rules with parsing rules.
Note that this grammar does not include lexical rules. |
I agree! |
Note also that:
|
That will be nice if your proposal for identification variables makes it into Jakarta Persistence 3.2. Do you know when they will decide on it? Otherwise we woiuld need to choose between including identification variables for now and making optional later to coincide with Jakarta Persistence in the future, or wait on the query language until next version to line up with Jakarta Persistence.
At first I thought that was a mistake, but it does make sense it can be inferred. It will depend on being able to remove identification variables. If it turns out we cannot, then I think it would be too much to ask of Jakarta Data providers to need to figure out the identification variable from the remainder of the Overall this looks great and headed in the right direction. I spotted a few minor things, probably typos: It doesn't make any difference to me whether to include |
I think we should assume it won't be for JPA 3.2, and we will have to wait for JPA 4. I mean, I guess could push for it i.e. nag @lukasj about it....
I guess I don't really see it in quite those terms. It's a sort of "trivial" transformation to go adding an identification var at the start of each path, given how simplistic the above grammar is. That's something that's even pretty straightforward to do at annotation-processing time. But on the other hand note that Hibernate already supports this so I know very well that I won't have to do this transformation. [So I guess I'm biased.] Of course we can put the identification variable back in to the grammar, and make it optional later, but it's pretty redundant here unless we're going to allow multiple entities in the
Yes, for sure that's the easiest way. It's certainly how I'm going to do it. But IMO, it's a nongoal at this stage to provide for interop between Jakarta Data providers and Jakarta Persistence providers. So the repository implementation is not really limited to calling the JPA provider via JPA-standard APIs. So, for example, our implementation of Jakarta Data is going to support Hibernate, but probably not EclipseLink, at least not initially. And Hibernate already accepts this syntax for years. If someone wants to make an EclipseLink-based Jakarta Data provider, then they can build support for identification variable-free queries directly into EclipseLink, assuming it doesn't already tolerate that, without waiting for permission from the JPA spec. So what my contention is that it's OK if the JPA spec trails a bit behind the implementations in this respect. |
Note, FTR, that I do waaaaaay more magical things than this in our annotation processor: I not only parse JPQL, but I completely type check the JPQL against the entity types in the compilation unit and spit out appropriate compilation errors. |
Heh, no, actually, that's the standard concatenation operator Yeah the syntax used in the JPA spec is a bit ambiguous. |
For the record, the relevant issue is: jakartaee/persistence#452 |
Nice - I was comparing with the BNF in the Jakarta Persistence 3.1 spec which didn't have |
So I wanted to keep aggregate functions out of this for this release, but I now realize that there's precisely one aggregate function we are going to need, and it's This is a hard requirement in order to be able to fix the following abomination in long countBy(); When I saw first this I thought it was a mistake. And then I realized what was really going on. It's because it's a magical method name query. 😒 So once we have a proper query language, it can be redefined as: @Query("select count(*)")
long countAll(); Similarly, "exists" queries can be written in terms of I note, for the record, that even for the very first, most primitive repository, magical method name queries already lead to unnatural naming. We gotta get away from that stuff. |
In terms of regular (i.e. non-aggregate) functions, the ones I would propose are:
Those, together with |
Finally, I wonder what further simplifications we could reasonably make to the grammar I posted above, just to make it as easy as possible to implement for a first release:
On the other hand, I just realized that even though I stripped * Note: the spec currently mentions that magical method name queries can have |
Here's an update with:
I have not restored So the following is a well-defined and strict superset of what is current possible with method name queries.
Even if this grammar looks a bit verbose, it's really very straightforward and super-easy to implement. |
A word about the Still, we probably would want to have a way to chop up strings. JPQL has a |
The above is one way to fix the
|
To clarify, don't interpret my prior comment as an argument against query language. I just wanted to point out one of the other possible approaches. It would be nice to eventually end up with both. |
Yeah, I've toyed with that idea a bit myself. The reasons I'm not really a fan of it is that:
Of course, on 2, YMMV. Note that there is one case where |
That brings up another, even simpler option. Just remove |
That would be great. |
Rough draft of a section on expressions ExpressionsAn expression is a sequence of tokens to which a Java type can be assigned, and which evaluates to a well-defined value when the query is executed. In JDQL, expressions may be categorized as:
A string, integer, or decimal literal is assigned the type it would be assigned in Java. So, for example, When executed, a literal expression evaluates to its literal value. The special values A parameter expression, with syntax given by @Query("where title like :titlePattern")
List<Book> booksMatchingTitle(String titlePattern); When executed, a parameter expression evaluates to the argument supplied to the parameter of the repository method. An enum literal expression is a Java identifier, with syntax specified by When executed, an enum expression evaluates to the named member of the Java A path expression is a period-separated list of Java identifiers, with syntax specified by
The type of the whole path expression is the type of the last element of the list. For example, When executed, a path expression is evaluated in the context of a given record of the queried entity type, and evaluates to the value of the entity field for the given record. A function call is the name of a JDQL function, followed by a parenthesized list of argument expressions, with syntax given by
When any argument expression of any function call evaluates to a null value, the whole function call evaluates to null. The syntax of an operator expression is given by the The concatenation operator The numeric operators NOTE: As an exception, when the operands of The four numeric operators may also be applied to an operand of wrapper type, for example, to The four numeric operators may also be applied to operands of type The type assigned to an operator expression depends on the types of its operand expression, which need not be identical. The rules for numeric promotion are given in section 4.7 of the Jakarta Persistence specification version 3.2:
A numeric operator expression is evaluated according to the native semantics of the database. In translating an operator expression to the native query language of the database, a Jakarta Data provider is encouraged, but not required, to apply reasonable transformations so that evaluation of the expression more closely mimics the semantics of the Java language. |
would it make sense to take JPQL out from the persistence spec and split it into 2+ parts ("core", extensions for RDBS, extensions for NoSQL,...) instead? That would allow consistency and interoperability of the QL across Jakarta specs as well as open an option to having more independent implementations of the parser itself (...and some of them possibly in non-Java language) |
Yes it would totally make sense. I would love to see that, and I would love to work on it. BUT:
On the other hand, with those caveats stated, the stuff we write down now could be used as a starting point for such a "substantial rewrite". That is to say, the JDQL spec we produce here could eventually be the "core" part of a new Jakarta Query spec. Of course we need to make sure the two languages don't diverge. But I'm very confident that this is achievable. And @lukasj it would be awesome if you could keep your finger on what is going on here. |
Strawman for conditional expressions, where I have taken care to not require an implementation based on ternary logic. Is that the right approach?? Conditional expressionsA conditional expression is a sequence of tokens which specifies a condition which, for a given record, might be satisfied or unsatisfied. Unlike the scalar Expressions defined in the previous section, a conditional expression is not considered to have a well-defined type. NOTE: JPQL defines the result of a conditional expression in terms of ternary logic. JDQL does not specify that a conditional expression evaluates to well-defined value, only the effect of the conditional expression when it is used as a restriction. The "value" of a conditional expression is not considered observable by the application program. Conditional expressions may be categorized as:
The syntax for conditional expressions is given by the A
An
A
Or, if the
A
Within the pattern, The equality and inequality operators are
NOTE: Portability is maximized when Jakarta Data providers interpret equality and inequality operators in a manner consistent with the implementation of NOTE: For string values, a database might have a different collation algorithm to Java. In evaluating an inequality involving string operands, an implementation of JDQL is not required to emulate Java collation. The logical operators are
This specification leaves undefined the interpretation of the CAUTION: A compliant implementation of JDQL might feature SQL/JPQL-style ternary logic, where |
This is an excellent idea. I agree that given the timeframe, the only achievable approach for EE 11 will be to put the subset query language in Jakarta Data, but as long as we are careful to ensure it is a subset of JPQL, it should be possible to move it to a Jakarta Query spec for both specs to use in EE 12 without breaking compatibility. |
I love the idea; let's see the others. |
The above issue comments with draft specification text seem like they represent a level of capability that make sense for Jakarta Data to include in a subset of JPQL. I think the biggest concern will be ensuring all the details fully line up with Jakarta Persistence (hopefully we can get participants from Jakarta Persistence to help review/confirm that as well when we are further along) without defining anything incompatible. In general, I would say these look great and are very well written. |
Going to need information on which parts of the expression language are not required on specific kinds of datastore technology. A lot of that information is in section 4.6.2 I suppose. But I'm a bit surprised by the extent of the limitations listed there. For example: is it really true that |
With this last bit, I believe the language is close to fully-specified. (Though I believe I still need to add some more info on typing rules.) ClausesEach JDQL statement is built from a sequence of clauses. The beginning of a clause is identified by a keyword: There is a logical ordering of clauses, reflecting the order in which their effect must be computed by the datastore:
The interpretation and effect of each clause in this list is influenced by clauses occurring earlier in the list, but not by clauses occurring later in the list. The The NOTE: The syntax of the The The The The The The The StatementsFinally, there are three kinds of statement:
The clauses which can appear in a statement are given by the grammar for each kind of statement. A An A |
Also, if we really are supporting multiple items in the select list, we need to specify how they are returned. JPQL says they're returned as an array of type record Summary(String isbn, String title, String author) {}
@Query("select isbn, title, author.name from Book")
List<Summary> summaries(); |
In general, a document can be implemented in the same way as a wide column; however, we don't guarantee that all the fields will be included. For example, Cassandra supports it only on the key or indexed fields. Amazon DynamoDB, as far as I know is a document that has not the in clausure. In those cases, I would go to town an |
I have collected my proposals here https://github.com/jakartaee/data/pull/520/files. |
Supporting multiple items in the select list is nice for writing more efficient queries that don't need to fetch the entire content from the database. record Summary(String isbn, String title, String author) {}
@Query("select new org.eclipse.example.Summary(b.isbn, b.title, b.author.name) from Book b")
List<Summary> summaries(); |
@njr-11 I actually hate the Remember that this syntax was from JPA 1.0, from before Java had generics, and has not really made a lot of sense since we introduced But here, the argument that we don't need it is even stronger, since we can always safely infer the return type of the query from looking at the return type of the repository method. So I think it's reasonable to say that a repository method can just automatically repackage the |
Excellent - that would be great to see the more concise syntax go into JPQL. In that case, I think we should omit this from our version 1.0 and plan to add it in once JPQL adds the more concise form. |
That's fine by me. |
FTR, my proposal was just merged by @lukasj, and this is now a legal JPQL query: FROM Order
WHERE customer.lastname = 'Smith'
AND customer.firstname = 'John' and is equivalent to: SELECT this
FROM Order AS this
WHERE this.customer.lastname = 'Smith'
AND this.customer.firstname = 'John' |
Done!! 🎉🥳🎈🎉 |
Reopening - we need to write TCK tests for this. |
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
…resent Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
As a ...
I need to be able to ...
The Jakarta Data project aims to introduce a unified Query Language specification to facilitate seamless data querying across various databases, primarily focusing on SQL and NoSQL. The recent delay in the release of Jakarta presents an opportunity to include more features in the specification before the official release.
Which enables me to ...
The primary goal of this proposal is to initiate a discussion and a vote regarding the inclusion of the Jakarta Data project in the Jakarta EE 1.0 release.
Key Features:
Query Language Specification: The Jakarta Data project will introduce a Query Language specification tailored to work across different database systems, focusing on supporting SQL and NoSQL databases.
Supported Operations: Initially, the specification will support essential query operations such as selecting, deleting, inserting, and updating data. Subsequent versions may incorporate additional query functionalities.
Compatibility: A significant challenge is ensuring compatibility between SQL and NoSQL databases. For instance, while certain operations like joins may not be universally supported across all databases, efforts will be made to accommodate such operations optionally or provide alternative approaches.
Annotation Support: The project will include annotation support, such as the
@Query
annotation, allowing developers to specify whether to use native queries from the provider or the Jakarta Data Query Language.Additional information
Scope for Discussion:
Inclusion in Jakarta EE 1.0: Discuss whether it is appropriate to include the Jakarta Data project in the Jakarta EE 1.0 release, considering its potential benefits and impact on the ecosystem.
Compatibility and Extensibility: Evaluate the proposed Query Language specification's compatibility with different database systems and its extensibility for future enhancements.
BNF to Start:
The text was updated successfully, but these errors were encountered: