You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context:
I am using dbplyr::window_order and dbplyr::window_frame to create windowed sums of a variable. For some rows I might not have enough data points for a full windowed sum, i.e., I am at the end of my data, but need data for the next few days. In this case the windowed sum would be NA or a partial sum, depending on what row you are (see example in reprex). Then, to remove these rows without a full window, I use dplyr::filter.
Problem:
Even though the windowed sum calculation comes before the use of dplyr::filter, the values are actually calculated as if the dplyr::filter was applied before the windowed calculations, yielding wrong summations. See the reprex below for a small example, and the expected vs actual SQL query generated.
Example:
# Create the following table data<-tibble::tibble(
identification= c(1,1,1),
date= c(1,2,3),
value= c(1,1,1)
)
# Write the data to a remote server before continuing
# Remove the rows without a full windowdplyr::filter(sums, date<=1)
Output:
Expected:
This is the query generated:
SELECT*,
SUM("value") OVER (PARTITIONBY"identification"ORDERBY"identification", "date"ROWSBETWEEN1FOLLOWINGAND2FOLLOWING) AS"summed"FROM"vrvkvwtftvhrekibsploqvthpfjmzt"
WHERE ("date"<=1.0)
Notice that the filter is added directly to the select statement, instead of being added to a second select statement after the windowed sum is calculated. This is the query I expected (since filtering comes after the creation of the windowed sums):
SELECT*
FROM (
SELECT*,
SUM("value") OVER (PARTITIONBY"identification"ORDERBY"identification", "date"ROWSBETWEEN1FOLLOWINGAND2FOLLOWING) AS"summed"FROM"vrvkvwtftvhrekibsploqvthpfjmzt"
WHERE ("date"<=1.0)
);
The text was updated successfully, but these errors were encountered:
Context:
I am using
dbplyr::window_order
anddbplyr::window_frame
to create windowed sums of a variable. For some rows I might not have enough data points for a full windowed sum, i.e., I am at the end of my data, but need data for the next few days. In this case the windowed sum would be NA or a partial sum, depending on what row you are (see example in reprex). Then, to remove these rows without a full window, I usedplyr::filter
.Problem:
Even though the windowed sum calculation comes before the use of
dplyr::filter
, the values are actually calculated as if thedplyr::filter
was applied before the windowed calculations, yielding wrong summations. See the reprex below for a small example, and the expected vs actual SQL query generated.Example:
The problem happens here:
Output:
![Screenshot 2023-02-07 at 4 00 41 PM](https://user-images.githubusercontent.com/68299186/217364623-1e9d4603-6f04-429d-9c90-ded12b93eada.png)
![Screenshot 2023-02-07 at 4 00 41 PM copy](https://user-images.githubusercontent.com/68299186/217364846-291478fe-91bb-463c-bd42-8a1b55c11d5e.png)
Expected:
This is the query generated:
Notice that the filter is added directly to the select statement, instead of being added to a second select statement after the windowed sum is calculated. This is the query I expected (since filtering comes after the creation of the windowed sums):
The text was updated successfully, but these errors were encountered: