Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

console: want better observability for queries executed with multiple plans #67519

Open
andreimatei opened this issue Jul 13, 2021 · 4 comments
Assignees
Labels
A-sql-observability Related to observability of the SQL layer C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability

Comments

@andreimatei
Copy link
Contributor

andreimatei commented Jul 13, 2021

An outage was recently caused by a query sometimes getting a bad plan. Figuring that out was painful. We would have really benefited from the Statements page in the Console yelling this at us. Even after we have correctly identified the misbehaving query, figuring out that, on a minority of occasions, this query executes with a different plan was a pain. One way or another, I'd like to get a hint about in the Statements page about which queries are using multiple plans and then, at the level of a single statement, I'd like to see all the plans that were used recently. Right now I think I see only the... last plan that was used by the node that's rendering that page (is this right)?

There's perhaps a bigger issue here; I'm hesitant to open a dedicated issue to not get cursed. Do we really want multiple plans per (prepared) statement? I assume these different plans are generated based on individual parameter values. In this incident, things went wrong cause one of the plans was really bad; I believe there's other relevant discussions elsewhere about whether we should ever prefer unbounded scans instead of bounded ones in cases where the optimizer thinks that the unbounded scan will be small. That discussion aside, I'm generally more and more of the opinion that predictability matters more than other optimizations and I feel like, in the case of prepared statements, getting different plans is usually a particularly surprising violation of predictability. I think in the majority of cases, I'd rather have a single plan per statement, and perhaps have to opt-in per statement into the current dynamic behavior.
I'm also imagining that, if we just picked one plan per prepared statement, we'd cut on the need to re-plan (or re-optimize?) on every execution. I don't know how much that cost, but I know that we still use "too much" CPU per statement execution (not necessarily because of the optimizer; I don't know who's to blame).

cc @jordanlewis @rytaft @RaduBerinde @nvanbenschoten

Jira issue: CRDB-8586

Epic CRDB-32139

@andreimatei andreimatei added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. T-sql-observability labels Jul 13, 2021
@jordanlewis
Copy link
Member

This will be greatly improved by @Azhng's work on #64743. Also see similar issue #67500, cc @kevin-v-ngo.

@rytaft
Copy link
Collaborator

rytaft commented Jul 13, 2021

Thanks for opening this, @andreimatei. Definitely agree with the need for better visibility into multiple plans.

... Do we really want multiple plans per (prepared) statement? ...

I think this is a great point, and something we've discussed at various times in the past. @RaduBerinde recently added support for a placeholder fast path, which performs full optimization at prepare time for very simple queries that only have one reasonable plan option (so there is no chance of choosing a suboptimal plan). However, we do not yet support fully optimizing queries at prepare time in general. I think this is something we should be able to support with an opt-in setting, and it will probably be a necessary step in our long term goal of supporting plan management. It's a large undertaking, though. cc @vy-ton to consider including in the roadmap for the next release cycle.

@RaduBerinde
Copy link
Member

Do we really want multiple plans per (prepared) statement?

Without parameter values we can't use histograms; we implemented histograms pretty early on because it became clear they were necessary for getting good plans in many real scenarios. So what you are proposing has a real risk of making many things worse. It's worth pointing out that even when a prepared query gets the same plan every time in practice, that can be because there is a pattern to the parameter values used - we might not be able to choose that plan without specific parameter values. In your scenario for example, maybe the "really bad plan" would be the plan we'd most reasonably pick without any values..

There is also a real downside to allowing the method of issuing a query to affect the quality of the plans. In many cases the app writer has no direct control over (or even knowledge of) how queries are executed. For example JDBC automatically prepares a statement that has been used more than X times - so imagine everything works great the first X times but then the plan suddenly becomes worse. As a developer you may not (and IMO should not) be aware of all these technical details along the way, so trying to figure this issue out would be a real problem. Even for us it's hard to know when a certain driver uses the "simple protocol" or does prepare + execute-once.

Then there's the problem of changing data - there are workloads where tables keep changing and growing and in many cases plans legitimately need to change. A long-running app that prepared its queries last week would be at a real disadvantage. But then if a connection needs to be reestablished for whatever reason (eg node failure) - and thus statements need to be re-prepared - suddenly we get updated plans. Imagine trying to figure that one out without knowing all the ins and outs of prepared statements.

We currently have the very clean guarantee that you get the same plan no matter how the query is issued, when or if it was prepared, or what is or isn't in our caches. This is something we can relax in the name of efficiency (to reduce planning time), but it's still something we should strive for as much as possible.

Predictability is an important ask, but I hope I've convinced you that it's not right to tie it to the low-level method of issuing a query. I see it as a plan management feature where you can "lock in" a plan for a given statement fingerprint. And maybe there's an easy (or even automatic) mechanism to lock in all plans for an app.

@kevin-v-ngo kevin-v-ngo self-assigned this Jul 19, 2021
@kevin-v-ngo
Copy link

From the observability side, we are tracking being able to detect plan changes through this issue: #67500. Moving this off SQL Observability triage board.

For query plan optimization during prepare time and plan management, I'll defer to @vy-ton

@maryliag maryliag added A-sql-observability Related to observability of the SQL layer and removed O-postmortem Originated from a Postmortem action item. labels Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-observability Related to observability of the SQL layer C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability
Projects
None yet
Development

No branches or pull requests

6 participants