Skip to content

Weird query behavior when using CTE without materialized hint #4499

@dongho-jung

Description

@dongho-jung

Describe the bug
I’m not exactly sure what the issue is, but there is a problem where the query results come out strangely when using a CTE without materialized hint.

For example, when explain a query like the following: (The query is quite long, and you don’t need to look at it. Ultimately, what I want to show is the difference in explain results depending on whether the materialized hint is used or not.)

explain with parsed_data as (
    select
        d.namespace,
        d.name,
        d.replicas,
        pdb.min_available,
        pdb.max_unavailable,
        case
            when pdb.min_available is not null then
                regexp_replace(pdb.min_available::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as min_available_int_val,
        case
            when pdb.min_available is not null and
                 regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as min_available_str_val,
        case
            when pdb.max_unavailable is not null then
                regexp_replace(pdb.max_unavailable::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as max_unavailable_int_val,
        case
            when pdb.max_unavailable is not null and
                 regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as max_unavailable_str_val
    from
        k8s_main.kubernetes_pod_disruption_budget pdb
        inner join k8s_main.kubernetes_deployment d on d.selector = pdb.selector
        and d.namespace = pdb.namespace
)
select
    namespace,
    name,
    min_available,
    max_unavailable,
    replicas,
    min_available_int_val
from
    parsed_data
where
    min_available_int_val >= replicas
order by
    namespace,
    name;

explain results are as follows:

+-----------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------+
|Sort  (cost=40040003898208.71..40040003919042.05 rows=8333333 width=141)                                               |
|  Sort Key: d.namespace, d.name                                                                                        |
|  ->  Nested Loop  (cost=0.00..40040001145833.33 rows=8333333 width=141)                                               |
|        ->  Foreign Scan on kubernetes_pod_disruption_budget pdb  (cost=0.00..40000000000000.00 rows=1000000 width=400)|
|        ->  Foreign Scan on kubernetes_deployment d  (cost=0.00..40000.00 rows=100 width=400)                          |
|              Filter: ((selector = pdb.selector) AND (namespace = pdb.namespace))                                      |
+-----------------------------------------------------------------------------------------------------------------------+

As you can see, the explain plan does not include any condition related to min_available_int_val >= replicas. So, when executing the query, this where clause is not applied at all, resulting in all rows being returned.

If I add the materialized hint, everything works correctly

explain with parsed_data as materialized (
    select
        d.namespace,
        d.name,
        d.replicas,
        pdb.min_available,
        pdb.max_unavailable,
        case
            when pdb.min_available is not null then
                regexp_replace(pdb.min_available::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as min_available_int_val,
        case
            when pdb.min_available is not null and
                 regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as min_available_str_val,
        case
            when pdb.max_unavailable is not null then
                regexp_replace(pdb.max_unavailable::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as max_unavailable_int_val,
        case
            when pdb.max_unavailable is not null and
                 regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as max_unavailable_str_val
    from
        k8s_main.kubernetes_pod_disruption_budget pdb
        inner join k8s_main.kubernetes_deployment d on d.selector = pdb.selector
        and d.namespace = pdb.namespace
)
select
    namespace,
    name,
    min_available,
    max_unavailable,
    replicas,
    min_available_int_val
from
    parsed_data
where
    min_available_int_val >= replicas
order by
    namespace,
    name;

->

+-------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------+
|Sort  (cost=40040005585708.72..40040005606542.05 rows=8333333 width=141)                                                 |
|  Sort Key: parsed_data.namespace, parsed_data.name                                                                      |
|  CTE parsed_data                                                                                                        |
|    ->  Nested Loop  (cost=0.00..40040002250000.00 rows=25000000 width=208)                                              |
|          ->  Foreign Scan on kubernetes_pod_disruption_budget pdb  (cost=0.00..40000000000000.00 rows=1000000 width=400)|
|          ->  Foreign Scan on kubernetes_deployment d  (cost=0.00..40000.00 rows=100 width=400)                          |
|                Filter: ((selector = pdb.selector) AND (namespace = pdb.namespace))                                      |
|  ->  CTE Scan on parsed_data  (cost=0.00..583333.33 rows=8333333 width=141)                                             |
|        Filter: (min_available_int_val >= replicas)                                                                      |
+-------------------------------------------------------------------------------------------------------------------------+

The only difference is the materialized hint at the top, but when checking the plan, we can see that Filter: (min_available_int_val >= replicas) is properly included, and the query executes as intended.

In fact, besides this example, I’ve encountered similar cases a few times before. Each time, I just added the materialized hint to resolve the issue.

I looked it up and found a few similar cases:

If this is a known issue and adding the materialized hint is the appropriate workaround, it would be helpful to explicitly mention it in the documentation—so others don’t have to struggle with the same issue like I did.

On the other hand, if this behavior is unintended and a clear bug, then it should be investigated and fixed at the root cause.

Steampipe version (steampipe -v)
v1.0.3

To reproduce
I haven’t been able to find a minimal, reproducible example. If you have the k8s plugin, please refer to [Describe the bug].

Expected behavior
"Ensuring that the where clause is not omitted even without the materialized hint” or “Documenting this issue with a workaround as guidance.”

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingext:needs-triageExternal issues that have been accepted and now need initial review/assessment/fix

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions