Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dplyr::dense_rank without attaching dplyr causes the function replacement to fail #1231

Closed
multimeric opened this issue Mar 28, 2023 · 0 comments · Fixed by #1254
Closed

Comments

@multimeric
Copy link

Here is a strange edge case. Under the following conditions, dplyr::dense_rank is not converted to a DENSE_RANK() SQL function:

  • dplyr::dense_rank() is inside a dplyr::across()
  • dplyr is not attached via library
  • The function is called via the namespace: dplyr::dense_rank

Here are some examples. Firstly, the failure case. Note how vec_rank (which is a dplyr, and not an SQL function) is used:

> dbplyr::lazy_frame(a=5:1, b=1:5) |> dplyr::mutate(dplyr::across(dplyr::everything(), dplyr::dense_rank)) |> dplyr::show_query()
<SQL>
SELECT
  vec_rank(`a`, 'dense' AS `ties`, 'na' AS `incomplete`) AS `a`,
  vec_rank(`b`, 'dense' AS `ties`, 'na' AS `incomplete`) AS `b`
FROM `df`
Warning messages:
1: Named arguments ignored for SQL vec_rank
2: Named arguments ignored for SQL vec_rank

However, if we simply remove the dplyr:: namespace, it works fine:

> dbplyr::lazy_frame(a=5:1, b=1:5) |> dplyr::mutate(dplyr::across(dplyr::everything(), dense_rank)) |> dplyr::show_query()
<SQL>
SELECT
  CASE
WHEN (NOT((`a` IS NULL))) THEN DENSE_RANK() OVER (PARTITION BY (CASE WHEN ((`a` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `a`)
END AS `a`,
  CASE
WHEN (NOT((`b` IS NULL))) THEN DENSE_RANK() OVER (PARTITION BY (CASE WHEN ((`b` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `b`)
END AS `b`
FROM `df`

Also, if we don't use across():

> dbplyr::lazy_frame(a=5:1, b=1:5) |> dplyr::mutate(a = dplyr::dense_rank(a)) |> dplyr::show_query()
<SQL>
SELECT
  CASE
WHEN (NOT((`a` IS NULL))) THEN DENSE_RANK() OVER (PARTITION BY (CASE WHEN ((`a` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `a`)
END AS `a`,
  `b`
FROM `df`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant