Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom aggregate method translation #23524

Closed
wants to merge 1 commit into from
Closed

Custom aggregate method translation #23524

wants to merge 1 commit into from

Conversation

roji
Copy link
Member

@roji roji commented Nov 30, 2020

This starts work on translating custom GroupBy aggregate methods (#22957). For now, the current hard-wired logic for GroupBy aggregate operators (Min/Max/Sum...) has been refactored out of RelationalSqlTranslatingExpressionVisitor to a new RelationalGroupingTranslatingExpressionVisitor, which could be extended by providers.

@smitpatel if you approve of the general direction, I'll continue and look into making GroupingExpression more suitable for usage by providers etc.

(note this is a draft, it's too early for nits/docs)

@roji roji requested a review from smitpatel November 30, 2020 10:39
@roji roji changed the title Custom GroupBy aggregate method translation Custom aggregate method translation Nov 30, 2020
@roji roji marked this pull request as draft November 30, 2020 12:36
@smitpatel
Copy link
Member

What is the purpose of this change? Can you please elaborate on following topics as I am not able to understand how those issues are solved in this PR.

  • What is the purpose of second visitor? What is it doing additionally which requires another visitor? The same visitation was happening in SqlTranslator already and I believe that unless something needs to happen in separate scope nested visitor are not necessary for any case.
  • How does this change provide extensibility to providers or plugins? Any extensions which can be done on new visitor should already be possible by deriving from SqlTranslator today also.

I don't see any value of the new visitor, it feels more like refactoring plus there is negative cost associated with it since SqlTranslator can perform the same task today only. Right now this PR seems to going Into the unknown

@roji
Copy link
Member Author

roji commented Nov 30, 2020

The main point is that currently all GroupBy aggregate logic is very hard to override (it's all logic embedded inside SqlTranslatingEV.VisitMethod). With this visitor, it's easy for a provider to just inherit from it, override VisitMethodCall and add the methods they want.

Note also that this allow overriding the behavior for the built-in operators (Max/Sum...) by splitting them out to their own methods, but SqlTranslatingEV already has TranslateMax/TranslateSum which only do the minimum work of generating the SqlFunctionExpression etc. So assuming we want overridability for GroupBy aggregate logic of built-in operators, I'm not sure what those methods would be named.

Finally, it's true that it's possible to achieve the same by extracting the logic out to a separate method on SqlTranslatingEV instead of to a different visitor - in that sense this isn't strictly necessary, and does contain an element of refactoring. Though that's not necessarily a bad thing - SqlTranslatingEV is pretty massive as-is, and it didn't seem like a bad idea to move GroupBy aggregate logic out to a separate component.

Do you see a disadvantage in this approach?

@smitpatel
Copy link
Member

Do you see a disadvantage in this approach?

Yes, it does not provide any additional extensibility and it adds more decision points.
Overriding VisitMethodCall on SqlTranslator is no different than overriding it in different visitor.

Further, ability to override handling of Select/Where/Distinct (or any other operator) applied on Grouping is not required and not sure what different behavior it could achieve and it is orthogonal to support for additional aggregate operators. It will most likely end up being YAGNI.

@roji
Copy link
Member Author

roji commented Nov 30, 2020

  • SqlTranslator.VisitMethodCall is currently 424 lines long, and definitely seems like a good candidate for a bit of refactoring. SqlTranslator as a whole is over 1600 lines long.
  • There are various grouping-specific methods which would most probably need to be exposed to providers (GetExpressionForAggregation, RemapLambda). If we add these to the surface of SqlTranslator that surface gets polluted (as opposed to adding them to the surface of a translator dedicated to grouping). It's just a matter of separating concerns in different components.
  • Re Sum/Max and others, it's mostly about not needlessly making things internal and non-overridable - why not split into different overridable methods if there's no cost.

But if you feel that strongly against this refactoring - which really seems quite straightforward and without any disadvantages - I can go do something else.

@smitpatel
Copy link
Member

  • I don't agree that number of lines in a method makes it a candidate for refactoring. It should not be relevant at all. Perhaps cyclomatic complexity be a better judge for refactoring to make it easier to unit test. Though this is query pipeline and we really cannot unit test this method so we need to have test which traverse each path regardless of refactoring so the number of lines doesn't really create any issue here.
  • Additionally, we want providers to override this method but call the base to do base processing (like how VisitBinary does in SqlServerSqlTranslatingExpressionVisitor). We don't want provider to copy paste body of this method in their codebase. If there are any specific parts inside this method which they may need to override and need access, we should make protected method for those explicitly. Adding more protected methods don't really create many decision points. QuerySqlGenerator.VisitSelect is prime example of it. Either you override whole method or just the part where you want to make change.
  • Number of lines in a file is largely irrelevant. We create classes based on the function it is performing with keeping perf in mind. (we don't want a new EV for each small thing as traversing EV takes time, examples are in Preprocessor). If number of lines in a file is bothering you too much then there are partial classes, your file lines would be smaller. Note this is different in tests since we can always split tests further into more granular components since they are disconnected from each other. The same is not true for source code.
  • I agree that few grouping specific methods need to be exposed to providers. I don't think adding protected methods is pollution. We are already deriving from ExpressionVisitor. There is enough pollution of Visit* methods. Deriving class should always look into overriding methods which achieves the behavior they desire. Other methods should be largely irrelevant.
  • The concern of translating a LINQ subtree to equivalent SqlExpression is handled by SqlTranslatingEV. Grouping is just part of it and not necessarily a separate concern. Notice that subquery which translates to single scalar result also processed inside SqlTranslatingEV only. There is no hard line on splitting this any further.
  • Re sum/max I agree that there could be a different refactoring and may be in different overridable methods. An extra ExpressionVisitor just for that is not warranted due to additional cost in maintenance and performance.

this refactoring - which really seems quite straightforward and without any disadvantages

This refactoring indeed have disadvantages at this point. Mainly it does not solve the issue of extensibility at all which is the main reason to make any changes in this area. If we add this additional visitor right now then it would be additional public surface. When we really try to solve issue of custom aggregate method translation then there could be 3 possibilities

  1. That solution fits in this refactoring and everything works fine.
  2. That solution does not fit in this refactoring and we need refactor differently and this is throw-away work.
  3. Even worse - the solution does not fit in this refactoring and rather than trying to find optimal solution to the problem we try to come up with a solution which fits in this.

To me in 67% of cases this refactoring is not useful at least until we have solved the main issue first. I strongly feel that doing this refactoring without solving the issue of custom aggregate methods basically puts us in a corner in terms of adding support for them in future and hinders our creativity in terms of arriving at ideal solution for it.

cc: @ajcvickers @AndriySvyryd

@roji
Copy link
Member Author

roji commented Nov 30, 2020

I don't agree that number of lines in a method makes it a candidate for refactoring.

As a semi-outsider to this part of the codebase, I disagree. I found it quite hard to find my way.

We don't want provider to copy paste body of this method in their codebase.

Agreed.

If there are any specific parts inside this method which they may need to override and need access, we should make protected method for those explicitly. Adding more protected methods don't really create many decision points.

We really are discussing whether the code should be refactored into a protected method in the same class (SqlTranslator), or out into a protected method in another class (a new visitor) - it really is mainly a matter of factoring the code grouping functions that belong together in coherent types. I don't understand what decision points mean here - if a method is on the same visitor or on another doesn't seem to matter to me.

we don't want a new EV for each small thing as traversing EV takes time, examples are in Preprocessor

I don't see the perf aspect here, but maybe I'm missing something. Whether we call Visit on SqlTranslator on Visit on GroupingTranslator doesn't make any difference.

There is enough pollution of Visit* methods.

Inheriting/implementing the standard ExpressionVisitor is one thing; adding new stuff that's specific to GroupBy aggregate translation is different (GetExpressionForAggregation, RemapLambda). Now any provider writer who's interacting with SqlTranslator sees them (e.g. Intellisense) and must understand if they need to deal with them, whether they're interested in custom aggregates or not. Trust me, as a provider writer the query pipeline can be quite confusing as it is.

Mainly it does not solve the issue of extensibility at all which is the main reason to make any changes in this area

As I wrote above, this is first step before tackling the main parts around extensibility. The fact that the refactoring itself doesn't solve the issue doesn't mean it's bad.


However... let me save us all time and ASCII characters and leave you to do this work, assuming you feel it's important. You obviously have a very clear idea of how you want the codebase to look like, and I'd rather not waste time on this.

@roji roji closed this Nov 30, 2020
@smitpatel
Copy link
Member

You have right to disagree though I don't feel your points are changing anything fundamental in what I have written above.

@ajcvickers ajcvickers reopened this Dec 8, 2020
@ajcvickers
Copy link
Member

@roji It seems to me that one way to measure the value here is to compare what the provider code looks like with and without this change. Is it possible to see what the provider PR would look like in each case?

@ajcvickers
Copy link
Member

Following up off-line.

@ajcvickers ajcvickers closed this Dec 8, 2020
@smitpatel smitpatel deleted the CustomGroupBy branch January 7, 2021 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants