console: want better observability for queries executed with multiple plans #67519

andreimatei · 2021-07-13T00:49:14Z

An outage was recently caused by a query sometimes getting a bad plan. Figuring that out was painful. We would have really benefited from the Statements page in the Console yelling this at us. Even after we have correctly identified the misbehaving query, figuring out that, on a minority of occasions, this query executes with a different plan was a pain. One way or another, I'd like to get a hint about in the Statements page about which queries are using multiple plans and then, at the level of a single statement, I'd like to see all the plans that were used recently. Right now I think I see only the... last plan that was used by the node that's rendering that page (is this right)?

There's perhaps a bigger issue here; I'm hesitant to open a dedicated issue to not get cursed. Do we really want multiple plans per (prepared) statement? I assume these different plans are generated based on individual parameter values. In this incident, things went wrong cause one of the plans was really bad; I believe there's other relevant discussions elsewhere about whether we should ever prefer unbounded scans instead of bounded ones in cases where the optimizer thinks that the unbounded scan will be small. That discussion aside, I'm generally more and more of the opinion that predictability matters more than other optimizations and I feel like, in the case of prepared statements, getting different plans is usually a particularly surprising violation of predictability. I think in the majority of cases, I'd rather have a single plan per statement, and perhaps have to opt-in per statement into the current dynamic behavior.
I'm also imagining that, if we just picked one plan per prepared statement, we'd cut on the need to re-plan (or re-optimize?) on every execution. I don't know how much that cost, but I know that we still use "too much" CPU per statement execution (not necessarily because of the optimizer; I don't know who's to blame).

cc @jordanlewis @rytaft @RaduBerinde @nvanbenschoten

Jira issue: CRDB-8586

Epic CRDB-32139

jordanlewis · 2021-07-13T01:05:29Z

This will be greatly improved by @Azhng's work on #64743. Also see similar issue #67500, cc @kevin-v-ngo.

rytaft · 2021-07-13T11:59:15Z

Thanks for opening this, @andreimatei. Definitely agree with the need for better visibility into multiple plans.

... Do we really want multiple plans per (prepared) statement? ...

I think this is a great point, and something we've discussed at various times in the past. @RaduBerinde recently added support for a placeholder fast path, which performs full optimization at prepare time for very simple queries that only have one reasonable plan option (so there is no chance of choosing a suboptimal plan). However, we do not yet support fully optimizing queries at prepare time in general. I think this is something we should be able to support with an opt-in setting, and it will probably be a necessary step in our long term goal of supporting plan management. It's a large undertaking, though. cc @vy-ton to consider including in the roadmap for the next release cycle.

RaduBerinde · 2021-07-13T17:34:29Z

Do we really want multiple plans per (prepared) statement?

Without parameter values we can't use histograms; we implemented histograms pretty early on because it became clear they were necessary for getting good plans in many real scenarios. So what you are proposing has a real risk of making many things worse. It's worth pointing out that even when a prepared query gets the same plan every time in practice, that can be because there is a pattern to the parameter values used - we might not be able to choose that plan without specific parameter values. In your scenario for example, maybe the "really bad plan" would be the plan we'd most reasonably pick without any values..

There is also a real downside to allowing the method of issuing a query to affect the quality of the plans. In many cases the app writer has no direct control over (or even knowledge of) how queries are executed. For example JDBC automatically prepares a statement that has been used more than X times - so imagine everything works great the first X times but then the plan suddenly becomes worse. As a developer you may not (and IMO should not) be aware of all these technical details along the way, so trying to figure this issue out would be a real problem. Even for us it's hard to know when a certain driver uses the "simple protocol" or does prepare + execute-once.

Then there's the problem of changing data - there are workloads where tables keep changing and growing and in many cases plans legitimately need to change. A long-running app that prepared its queries last week would be at a real disadvantage. But then if a connection needs to be reestablished for whatever reason (eg node failure) - and thus statements need to be re-prepared - suddenly we get updated plans. Imagine trying to figure that one out without knowing all the ins and outs of prepared statements.

We currently have the very clean guarantee that you get the same plan no matter how the query is issued, when or if it was prepared, or what is or isn't in our caches. This is something we can relax in the name of efficiency (to reduce planning time), but it's still something we should strive for as much as possible.

Predictability is an important ask, but I hope I've convinced you that it's not right to tie it to the low-level method of issuing a query. I see it as a plan management feature where you can "lock in" a plan for a given statement fingerprint. And maybe there's an easy (or even automatic) mechanism to lock in all plans for an app.

kevin-v-ngo · 2021-07-19T16:12:31Z

From the observability side, we are tracking being able to detect plan changes through this issue: #67500. Moving this off SQL Observability triage board.

For query plan optimization during prepare time and plan management, I'll defer to @vy-ton

andreimatei added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. T-sql-observability labels Jul 13, 2021

kevin-v-ngo self-assigned this Jul 19, 2021

maryliag added A-sql-observability Related to observability of the SQL layer and removed O-postmortem Originated from a Postmortem action item. labels Mar 2, 2023

exalate-issue-sync bot added T-observability and removed T-sql-observability labels Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

console: want better observability for queries executed with multiple plans #67519

console: want better observability for queries executed with multiple plans #67519

andreimatei commented Jul 13, 2021 •

edited by exalate-issue-sync bot

Loading

jordanlewis commented Jul 13, 2021

rytaft commented Jul 13, 2021 •

edited

Loading

RaduBerinde commented Jul 13, 2021

kevin-v-ngo commented Jul 19, 2021

console: want better observability for queries executed with multiple plans #67519

console: want better observability for queries executed with multiple plans #67519

Comments

andreimatei commented Jul 13, 2021 • edited by exalate-issue-sync bot Loading

jordanlewis commented Jul 13, 2021

rytaft commented Jul 13, 2021 • edited Loading

RaduBerinde commented Jul 13, 2021

kevin-v-ngo commented Jul 19, 2021

andreimatei commented Jul 13, 2021 •

edited by exalate-issue-sync bot

Loading

rytaft commented Jul 13, 2021 •

edited

Loading