Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join on multi valued dimension fails when using CTE #16896

Open
pjain1 opened this issue Aug 14, 2024 · 0 comments
Open

Join on multi valued dimension fails when using CTE #16896

pjain1 opened this issue Aug 14, 2024 · 0 comments

Comments

@pjain1
Copy link
Member

pjain1 commented Aug 14, 2024

Consider a datasource having a multi valued dimension. If there is a full join query like below it works

SELECT
  (COALESCE(base."channel_mvd", comparison."channel_mvd")) AS "channel",
  (ANY_VALUE(base."added" - comparison."added")) AS "added_delta"
FROM (
  SELECT
    "channel_mvd",
    (SUM(added)) AS "added"
  FROM "wikipedia" CROSS JOIN UNNEST(MV_TO_ARRAY("channel")) as u("channel_mvd")
  WHERE ("__time" >= '2016-06-27T03:00:00.000Z' AND "__time" < '2016-06-27T06:00:00.000Z') 
  GROUP BY 1
) AS base
FULL JOIN (
  SELECT
    "channel_mvd",
    (SUM(added)) AS "added"
  FROM "wikipedia" CROSS JOIN UNNEST(MV_TO_ARRAY("channel")) as u("channel_mvd")
  WHERE ("__time" >= '2016-06-27T00:00:00.000Z' AND "__time" < '2016-06-27T03:00:00.000Z') 
  GROUP BY 1
) AS comparison ON (base."channel_mvd" IS NOT DISTINCT FROM comparison."channel_mvd")
GROUP BY 1
ORDER BY "added_delta" DESC
LIMIT 250

added CROSS JOIN UNNEST(MV_TO_ARRAY("channel")) as u("channel_mvd") just to simulate column being mvd as channel is not. Anyways if any mvd is used directly in the join above then also it works.

However, if we do the same thing with CTE then it fails (assume channel_mvd is mvd here) -

WITH base AS (
  SELECT
    "channel_mvd",
    (SUM(added)) AS "added"
  FROM "wikipedia"
  WHERE ("__time" >= '2016-06-27T03:00:00.000Z' AND "__time" < '2016-06-27T06:00:00.000Z')
  GROUP BY 1
  ORDER BY "channel_mvd" DESC
  LIMIT 250
)
SELECT
  (COALESCE(base."channel_mvd", comparison."channel_mvd")) AS "channel",
  (ANY_VALUE(base."added" - comparison."added")) AS "added_delta"
FROM base
LEFT JOIN (
  SELECT
    "channel_mvd",
    (SUM(added)) AS "added"
  FROM "wikipedia"
  WHERE ("__time" >= '2016-06-27T00:00:00.000Z' AND "__time" < '2016-06-27T03:00:00.000Z') AND (
    ("channel_mvd") IN (SELECT "base"."channel_mvd" FROM "base")
  )
  GROUP BY 1
) AS comparison ON (base."channel_mvd" IS NOT DISTINCT FROM comparison."channel_mvd")
GROUP BY 1
ORDER BY "added_delta" DESC
LIMIT 250

It fails with QueryUnsupportedException{msg=Joining against a multi-value dimension is not supported. and if I add CROSS JOIN UNNEST(MV_TO_ARRAY("channel")) AS u ("channel_mvd") then it fails with Unhandled Query Planning Failure, see broker logs for details

@pjain1 pjain1 changed the title Joins with CTE on mvd columns fails Joins on mvd columns fails when using CTE Aug 14, 2024
@pjain1 pjain1 changed the title Joins on mvd columns fails when using CTE Join on mvd columns fails when using CTE Aug 14, 2024
@pjain1 pjain1 changed the title Join on mvd columns fails when using CTE Join on multi valued dimension fails when using CTE Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant