-
Notifications
You must be signed in to change notification settings - Fork 302
Description
Describe the bug
I’m not exactly sure what the issue is, but there is a problem where the query results come out strangely when using a CTE without materialized hint.
For example, when explain
a query like the following: (The query is quite long, and you don’t need to look at it. Ultimately, what I want to show is the difference in explain
results depending on whether the materialized
hint is used or not.)
explain with parsed_data as (
select
d.namespace,
d.name,
d.replicas,
pdb.min_available,
pdb.max_unavailable,
case
when pdb.min_available is not null then
regexp_replace(pdb.min_available::text, '.*IntVal:([0-9]*).*', '\1')::int
else null
end as min_available_int_val,
case
when pdb.min_available is not null and
regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
round(regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
else null
end as min_available_str_val,
case
when pdb.max_unavailable is not null then
regexp_replace(pdb.max_unavailable::text, '.*IntVal:([0-9]*).*', '\1')::int
else null
end as max_unavailable_int_val,
case
when pdb.max_unavailable is not null and
regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
round(regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
else null
end as max_unavailable_str_val
from
k8s_main.kubernetes_pod_disruption_budget pdb
inner join k8s_main.kubernetes_deployment d on d.selector = pdb.selector
and d.namespace = pdb.namespace
)
select
namespace,
name,
min_available,
max_unavailable,
replicas,
min_available_int_val
from
parsed_data
where
min_available_int_val >= replicas
order by
namespace,
name;
explain
results are as follows:
+-----------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+-----------------------------------------------------------------------------------------------------------------------+
|Sort (cost=40040003898208.71..40040003919042.05 rows=8333333 width=141) |
| Sort Key: d.namespace, d.name |
| -> Nested Loop (cost=0.00..40040001145833.33 rows=8333333 width=141) |
| -> Foreign Scan on kubernetes_pod_disruption_budget pdb (cost=0.00..40000000000000.00 rows=1000000 width=400)|
| -> Foreign Scan on kubernetes_deployment d (cost=0.00..40000.00 rows=100 width=400) |
| Filter: ((selector = pdb.selector) AND (namespace = pdb.namespace)) |
+-----------------------------------------------------------------------------------------------------------------------+
As you can see, the explain
plan does not include any condition related to min_available_int_val >= replicas
. So, when executing the query, this where
clause is not applied at all, resulting in all rows being returned.
If I add the materialized
hint, everything works correctly
explain with parsed_data as materialized (
select
d.namespace,
d.name,
d.replicas,
pdb.min_available,
pdb.max_unavailable,
case
when pdb.min_available is not null then
regexp_replace(pdb.min_available::text, '.*IntVal:([0-9]*).*', '\1')::int
else null
end as min_available_int_val,
case
when pdb.min_available is not null and
regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
round(regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
else null
end as min_available_str_val,
case
when pdb.max_unavailable is not null then
regexp_replace(pdb.max_unavailable::text, '.*IntVal:([0-9]*).*', '\1')::int
else null
end as max_unavailable_int_val,
case
when pdb.max_unavailable is not null and
regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
round(regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
else null
end as max_unavailable_str_val
from
k8s_main.kubernetes_pod_disruption_budget pdb
inner join k8s_main.kubernetes_deployment d on d.selector = pdb.selector
and d.namespace = pdb.namespace
)
select
namespace,
name,
min_available,
max_unavailable,
replicas,
min_available_int_val
from
parsed_data
where
min_available_int_val >= replicas
order by
namespace,
name;
->
+-------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+-------------------------------------------------------------------------------------------------------------------------+
|Sort (cost=40040005585708.72..40040005606542.05 rows=8333333 width=141) |
| Sort Key: parsed_data.namespace, parsed_data.name |
| CTE parsed_data |
| -> Nested Loop (cost=0.00..40040002250000.00 rows=25000000 width=208) |
| -> Foreign Scan on kubernetes_pod_disruption_budget pdb (cost=0.00..40000000000000.00 rows=1000000 width=400)|
| -> Foreign Scan on kubernetes_deployment d (cost=0.00..40000.00 rows=100 width=400) |
| Filter: ((selector = pdb.selector) AND (namespace = pdb.namespace)) |
| -> CTE Scan on parsed_data (cost=0.00..583333.33 rows=8333333 width=141) |
| Filter: (min_available_int_val >= replicas) |
+-------------------------------------------------------------------------------------------------------------------------+
The only difference is the materialized
hint at the top, but when checking the plan, we can see that Filter: (min_available_int_val >= replicas)
is properly included, and the query executes as intended.
In fact, besides this example, I’ve encountered similar cases a few times before. Each time, I just added the materialized
hint to resolve the issue.
I looked it up and found a few similar cases:
- Failing to apply where condition when joining
aws_ecr_image
steampipe-postgres-fdw#435 - Queries fail when joining tables on data from JSONB columns steampipe-postgres-fdw#352
- seemingly unnecessary ORDER BY is required in various situations #968
If this is a known issue and adding the materialized
hint is the appropriate workaround, it would be helpful to explicitly mention it in the documentation—so others don’t have to struggle with the same issue like I did.
On the other hand, if this behavior is unintended and a clear bug, then it should be investigated and fixed at the root cause.
Steampipe version (steampipe -v
)
v1.0.3
To reproduce
I haven’t been able to find a minimal, reproducible example. If you have the k8s plugin, please refer to [Describe the bug].
Expected behavior
"Ensuring that the where
clause is not omitted even without the materialized
hint” or “Documenting this issue with a workaround as guidance.”
Additional context
N/A