-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about using normalize = TRUE
in calculate_area_intersection_weights
#102
Comments
I'm rereading my documentation realizing it could be much more clear. It's been a bit since I looked at this and even then it was super confusing... so bear with me. From the examples:
In your example, the polygons from your source areas sum to 1 per target polygon. To apply weights, you need the area of each of your source polygons for weighted sum per target polygon. So in this case, area is the area of your source polygons and you would do this grouped by target polygon id.
The fact that there is no data over a portion of the target polygon is essentially ignored here and our area-weighted mean is only taking into account the source polygons that actually intersect the target. Is this helping? I think it's that partial overlap that is confusing. I'm not going to claim to have great ways to explain this! Will leave this issue open to clean up the documentation. A PR would be MUCH appreciated! |
Thanks, Dave! I definitely agree that the partial overlap was the point of confusion for me. I asked about the intended behavior because I saw some references to the target polygon area (like here and here) rather than the intersected area. I can think about ways this may have been clearer to me as a user. On one hand, it makes sense to me that the summed weights would return 1 given the name of the argument (i.e., |
Yeah... My sense of it is that I've never gotten (and consistently used) a good set of words for the arguments of the functions involved in these workflows. "Source data polygons" and "target polygons" should probably be used throughout? This normalized nuance is really just whether the weights have been normalized so you don't need to know the area of each source data polygon when calculating your area weights. The weights are a cached intermediate value and it's really just convention to save the weights in the normalized form or not. gdptools does it with normalized=TRUE and historically, I had been doing it with normalized=FALSE. My next steps with this work is to rewrite geoknife's internals to either call the gdptools-based web service or use ncdfgeom to iterate over a dataset with weights as we build here -- in that work, I'll be sure to clear all this up. With geoknife, it will be important that the cached weights are coherent and that we know what is what in using the cached values. I may just kill off the "normalized = FALSE" mode to be honest, I'm not sure it's really adding any value at this point. |
Thanks for providing such extensive documentation, including examples 🙌 ! My confusion around the normalize = TRUE argument is that I can't imagine a case where the summed weights would not equal 1 given that the weight is computed using the summed intersected area. For instance, in the first example from the docs, the output doesn't appear to me to follow the expectation described in the quote above. In other words, the weights sum to 1 even though the target (b) does not completely cover the source (a). Is this expected? a_b <- calculate_area_intersection_weights(a, b, normalize = TRUE)
a_b
#> # A tibble: 4 × 3
#> ida idb w
#> <dbl> <dbl> <dbl>
#> 1 1 1 1
#> 2 2 2 1
#> 3 2 3 0.375
#> 4 3 3 0.625
dplyr::summarize(dplyr::group_by(a_b, idb), w = sum(w))
#> # A tibble: 3 × 2
#> idb w
#> <dbl> <dbl>
#> 1 1 1
#> 2 2 1
#> 3 3 1 If so, would the following edits to the documentation help make this behavior clear?
I haven't looked under the hood at gdptools but based on a quick comparison, I'd think a small edit would be needed to match the normalize = TRUE behavior for cases where there is partial overlap: # for normalized, we return the percent of each target covered by each source
y <- mutate(y, target_area = as.numeric(sf::st_area(geometry)))
int <- areal::aw_intersect(y,
source = x,
areaVar = "area_intersection")
# for normalized, we sum the intersection area by the total target area
int <- ungroup(mutate(group_by(int, vary), totalArea_y = unique(target_area))) I hope I'm not missing something obvious. I agree with you that when using these intermediate weights to compute an area-weighted mean, for example, this behavior shouldn't really be consequential, since w is in the numerator and the denominator. I mention it here because Ellie mentioned she and Mike are publishing the intermediate weights for some polygon-polygon intersections and are sometimes seeing different values from R and Python (I figured this behavior I previously noticed might be related to that). |
Putting some time on this one. I found a typo above where I mixed up source and target. My brain is not good at these details. How is this looking? I'm going to keep going but this is a good start? library(ncdfgeom)
g <- list(rbind(c(-1,-1), c(1,-1), c(1,1), c(-1,1), c(-1,-1)))
a1 = sf::st_polygon(g) * 0.8
a2 = a1 + c(1, 2)
a3 = a1 + c(-1, 2)
b1 = sf::st_polygon(g)
b2 = b1 + 2
b3 = b1 + c(-0.2, 2)
b4 = b1 + c(2.2, 0)
a = sf::st_sfc(a1,a2,a3)
b = sf::st_sfc(b1, b2, b3, b4)
plot(c(a,b), border = NA)
plot(a, border = 'darkgreen', add = TRUE)
plot(b, border = 'red', add = TRUE)
a <- sf::st_sf(a, data.frame(ida = c(1, 2, 3)))
b <- sf::st_sf(b, data.frame(idb = c(7, 8, 9, 10)))
text(sapply(sf::st_geometry(a), \(x) mean(x[[1]][,1]) + 0.4),
sapply(sf::st_geometry(a), \(x) mean(x[[1]][,2]) + 0.3),
a$ida, col = "darkgreen")
text(sapply(sf::st_geometry(b), \(x) mean(x[[1]][,1]) + 0.4),
sapply(sf::st_geometry(b), \(x) mean(x[[1]][,2])),
b$idb, col = "red") sf::st_agr(a) <- sf::st_agr(b) <- "constant"
sf::st_crs(b) <- sf::st_crs(a) <- sf::st_crs(5070)
calculate_area_intersection_weights(a, b, normalize = FALSE)
#> Loading required namespace: areal
#> # A tibble: 4 × 3
#> ida idb w
#> <dbl> <dbl> <dbl>
#> 1 1 7 1
#> 2 2 8 0.5
#> 3 2 9 0.375
#> 4 3 9 0.625
# NOTE: normalize = FALSE so weights sum to 1 per source polygon
# when source is fully within target.
calculate_area_intersection_weights(a, b, normalize = TRUE)
#> # A tibble: 4 × 3
#> ida idb w
#> <dbl> <dbl> <dbl>
#> 1 1 7 1
#> 2 2 8 1
#> 3 2 9 0.375
#> 4 3 9 0.625
# NOTE: normalize = TRUE so weights sum to 1 per target polygon. Non-overlap
# is ignored as if it does not exist.
calculate_area_intersection_weights(b, a, normalize = FALSE)
#> # A tibble: 5 × 3
#> idb ida w
#> <dbl> <dbl> <dbl>
#> 1 7 1 0.64
#> 2 8 2 0.32
#> 3 9 2 0.24
#> 4 9 3 0.4
#> 5 10 NA NA
# NOTE: normalize = FALSE so weights never sum to 1 since no source is fully
# within target.
calculate_area_intersection_weights(b, a, normalize = TRUE)
#> # A tibble: 5 × 3
#> idb ida w
#> <dbl> <dbl> <dbl>
#> 1 7 1 1
#> 2 8 2 0.571
#> 3 9 2 0.429
#> 4 9 3 1
#> 5 10 NA NA
# NOTE: normalize = TRUE so weights sum to 1 per target polygon. Non-overlap
# is ignored as if it does not exist. Created on 2024-11-15 with reprex v2.1.1 |
The second example: This is now also rendered in #103 @lkoenig-usgs, @elliewhite-usgs, @rmcd-mscb, your review and thoughts on how to help clarify this and make it easier to understand would be greatly appreciated! # a more typical arrangement of polygons
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(ncdfgeom)
g <- list(rbind(c(-1,-1), c(1,-1), c(1,1),
c(-1,1), c(-1,-1)))
a1 = st_polygon(g) * 0.75 + c(-.25, -.25)
a2 = a1 + 1.5
a3 = a1 + c(0, 1.5)
a4 = a1 + c(1.5, 0)
b1 = st_polygon(g)
b2 = b1 + 2
b3 = b1 + c(0, 2)
b4 = b1 + c(2, 0)
a = st_sfc(a1,a2, a3, a4)
b = st_sfc(b1, b2, b3, b4)
b <- st_sf(b, data.frame(idb = c(1, 2, 3, 4)))
a <- st_sf(a, data.frame(ida = c(6, 7, 8, 9)))
plot(st_geometry(b), border = 'red', lwd = 3)
plot(st_geometry(a), border = 'darkgreen', lwd = 3, add = TRUE)
text(sapply(st_geometry(a), \(x) mean(x[[1]][,1]) + 0.4),
sapply(st_geometry(a), \(x) mean(x[[1]][,2]) + 0.3),
a$ida, col = "darkgreen")
text(sapply(st_geometry(b), \(x) mean(x[[1]][,1]) - 0.4),
sapply(st_geometry(b), \(x) mean(x[[1]][,2]) - 0.5),
b$idb, col = "red") st_agr(a) <- st_agr(b) <- "constant"
st_crs(b) <- st_crs(a) <- st_crs(5070)
a$val <- c(1, 2, 3, 4)
a$a_areasqkm <- 1.5 ^ 2
plot(a["val"], reset = FALSE)
plot(st_geometry(b), border = 'red', lwd = 3, add = TRUE, reset = FALSE)
plot(st_geometry(a), border = 'darkgreen', lwd = 3, add = TRUE)
text(sapply(st_geometry(a), \(x) mean(x[[1]][,1]) + 0.4),
sapply(st_geometry(a), \(x) mean(x[[1]][,2]) + 0.3),
a$ida, col = "darkgreen")
text(sapply(st_geometry(b), \(x) mean(x[[1]][,1]) - 0.4),
sapply(st_geometry(b), \(x) mean(x[[1]][,2]) - 0.5),
b$idb, col = "red") # say we have data from `a` that we want sampled to `b`.
# this gives the percent of each `a` that intersects each `b`
(a_b <- calculate_area_intersection_weights(
select(a, ida), select(b, idb), normalize = FALSE))
#> Loading required namespace: areal
#> # A tibble: 9 × 3
#> ida idb w
#> <dbl> <dbl> <dbl>
#> 1 6 1 1
#> 2 7 1 0.111
#> 3 8 1 0.333
#> 4 9 1 0.333
#> 5 7 2 0.444
#> 6 7 3 0.222
#> 7 8 3 0.667
#> 8 7 4 0.222
#> 9 9 4 0.667
# NOTE: `w` sums to 1 per `a` in all cases
summarize(group_by(a_b, ida), w = sum(w))
#> # A tibble: 4 × 2
#> ida w
#> <dbl> <dbl>
#> 1 6 1
#> 2 7 1
#> 3 8 1
#> 4 9 1
# Since normalize is false, we apply weights like:
st_drop_geometry(a) |>
left_join(a_b, by = "ida") |>
mutate(a_areasqkm = 1.5 ^ 2) |> # add area of each polygon in `a`
group_by(idb) |> # group so we get one row per `b`
# now we calculate the value for each b with fraction of the area of each
# polygon in `a` per polygon in `b` with an equation like this:
summarize(
new_val = sum( (val * w * a_areasqkm), na.rm = TRUE ) / sum(w * a_areasqkm))
#> # A tibble: 4 × 2
#> idb new_val
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#> 3 3 2.75
#> 4 4 3.5
# NOTE: `w` is the fraction of the polygon in a. We need to multiply w by the
# unique area of the polygon it is associated with to get the weighted mean weight.
# we can go in reverse if we had data from b that we want sampled to a
(b_a <- calculate_area_intersection_weights(
select(b, idb), select(a, ida), normalize = FALSE))
#> # A tibble: 9 × 3
#> idb ida w
#> <dbl> <dbl> <dbl>
#> 1 1 6 0.562
#> 2 1 7 0.0625
#> 3 2 7 0.25
#> 4 3 7 0.125
#> 5 4 7 0.125
#> 6 1 8 0.188
#> 7 3 8 0.375
#> 8 1 9 0.188
#> 9 4 9 0.375
# NOTE: `w` sums to 1 per `b` (source) only where `b` is fully covered by `a` (target).
summarize(group_by(b_a, idb), w = sum(w))
#> # A tibble: 4 × 2
#> idb w
#> <dbl> <dbl>
#> 1 1 1
#> 2 2 0.25
#> 3 3 0.5
#> 4 4 0.5
# Now let's look at what happens if we set normalize = TRUE. Here we
# get `a` as source and `b` as target but normalize the weights so
# the area of a is built into `w`.
(a_b <- calculate_area_intersection_weights(
select(a, ida), select(b, idb), normalize = TRUE))
#> # A tibble: 9 × 3
#> ida idb w
#> <dbl> <dbl> <dbl>
#> 1 6 1 0.562
#> 2 7 1 0.0625
#> 3 8 1 0.188
#> 4 9 1 0.188
#> 5 7 2 1
#> 6 7 3 0.25
#> 7 8 3 0.75
#> 8 7 4 0.25
#> 9 9 4 0.75
# NOTE: if we summarize by `b` (target) `w` sums to 1 where above, with
# normalize = FALSE, `w` summed to one per `a` (source).
summarize(group_by(a_b, idb), w = sum(w))
#> # A tibble: 4 × 2
#> idb w
#> <dbl> <dbl>
#> 1 1 1
#> 2 2 1
#> 3 3 1
#> 4 4 1
# Since normalize is false, we apply weights like:
st_drop_geometry(a) |>
left_join(a_b, by = "ida") |>
group_by(idb) |> # group so we get one row per `b`
# now we weight by the percent of each polygon in `b` per polygon in `a`
summarize(new_val = sum( (val * w), na.rm = TRUE ))
#> # A tibble: 4 × 2
#> idb new_val
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#> 3 3 2.75
#> 4 4 3.5
# NOTE: `w` is the fraction of the polygon from `a` overlapping the polygon from `b`.
# The area of `a` is built into the weight so we just sum the weith times value oer polygon. Created on 2024-11-15 with reprex v2.1.1 |
Addressing your comment @lkoenig-usgs -- I think you are correct, with normalize = TRUE, the weights will always sum to 1 per target polygon by design. I think I fixed that type in what I have in the comments above. This will be the new example documentation once we've had a chance to review. |
Fun - geometry brain teaser! @dblodgett-usgs and @lkoenig-usgs Here are some comments starting from the top. in the a_b and normalized = False case: The initial output of a_b makes sense, but when you summarize the group_by, I would think it should be As a general comment on the documentation, I would call a - green and b - red, and a_b - green_red, etc. For b_a, again I think you are summarizing the weights incorrectly, You should be grouping by ida, You want all the weights of b that intersect a, so you group_by ida, and in this case since b completely covers a, your weights should sum to 1 in all cases. Again this helps the user understand that b completely covers a. The last one a_b with normalize = True, is correct. Summarizing. I think it's all correct except for how you are summarizing the weights. I think in all cases you should summarize the weights using the same pattern, grouping by the id of I think I'm correct here, and maybe it's just semantics, but I think looking at the weights consistently will help the user. If I'm offbase, this is a bit of a brain teaser, I apologize :) |
This detail of whether to summarize by the ids of the source data polygons or the ids of the target data polygons is important. Teasing out what it means that we are summarizing to one or the other is the key to understanding what's going on. (I'm not going to claim to fully understand!) I'll stare at this for a bit and see if I can come up with a clear explanation and update the PR. Trying some plain language: Ultimately, we want values on the basis of the target polygons so grouping by their ids is what we need to do to get the answer. The weights are what allow us to add up the relative contributions of the value from each source-data polygon per target polygon. This distinction is independent of I like your suggestion to name the example inputs by the color used to visualize them. I'll pick colors less prone to color deficiencies though. Here's an update WIP. I'll keep going. Honestly, thinking I'll just drop the normalize = TRUE/FALSE detail here. It's just confusing things and not adding anything since the answers are the same. library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(ncdfgeom)
g <- list(rbind(c(-1,-1), c(1,-1), c(1,1), c(-1,1), c(-1,-1)))
blue1 = sf::st_polygon(g) * 0.8
blue2 = blue1 + c(1, 2)
blue3 = blue1 * 1.2 + c(-1, 2)
pink1 = sf::st_polygon(g)
pink2 = pink1 + 2
pink3 = pink1 + c(-0.2, 2)
pink4 = pink1 + c(2.2, 0)
blue = sf::st_sfc(blue1,blue2,blue3)
pink = sf::st_sfc(pink1, pink2, pink3, pink4)
plot(c(blue,pink), border = NA)
plot(blue, border = "#648fff", add = TRUE)
plot(pink, border = "#dc267f", add = TRUE)
blue <- sf::st_sf(blue, data.frame(idblue = c(1, 2, 3)))
pink <- sf::st_sf(pink, data.frame(idpink = c(7, 8, 9, 10)))
text(sapply(sf::st_geometry(blue), \(x) mean(x[[1]][,1]) + 0.4),
sapply(sf::st_geometry(blue), \(x) mean(x[[1]][,2]) + 0.3),
blue$idblue, col = "#648fff")
text(sapply(sf::st_geometry(pink), \(x) mean(x[[1]][,1]) + 0.4),
sapply(sf::st_geometry(pink), \(x) mean(x[[1]][,2])),
pink$idpink, col = "#dc267f") sf::st_agr(blue) <- sf::st_agr(pink) <- "constant"
sf::st_crs(pink) <- sf::st_crs(blue) <- sf::st_crs(5070)
(blue_pink_norm_false <-
calculate_area_intersection_weights(blue, pink, normalize = FALSE))
#> Loading required namespace: areal
#> # A tibble: 4 × 3
#> idblue idpink w
#> <dbl> <dbl> <dbl>
#> 1 1 7 1
#> 2 2 8 0.5
#> 3 2 9 0.375
#> 4 3 9 0.604
# NOTE: normalize = FALSE so weights sum to 1 per source polygon
# only when a source polygon is fully covered by the target. The
# non-intersecting portion is not included.
# the following breaks down how to use these weights for one source polygon.
blue$val = c(30, 10, 20)
blue$area <- as.numeric(sf::st_area(blue))
(result <- st_drop_geometry(blue) |>
left_join(blue_pink_norm_false, by = "idblue"))
#> idblue val area idpink w
#> 1 1 30 2.5600 7 1.0000000
#> 2 2 10 2.5600 8 0.5000000
#> 3 2 10 2.5600 9 0.3750000
#> 4 3 20 3.6864 9 0.6041667
# To calculate the value for idpink 9, we would do:
((10 * 0.375 * 2.56) + (20 * 0.604167 * 3.6864)) / ((0.375 * 2.56) + (0.604167 * 3.6864))
#> [1] 16.98795
# or in a simple form for a whole table:
(result <- result |>
group_by(idpink) |> # group so we get one row per target
# now we calculate the value for each `pink` with fraction of the area of each
# polygon in `blue` per polygon in `pink` with an equation like this:
summarize(
new_val = sum( (val * w * area), na.rm = TRUE ) / sum(w * area)))
#> # A tibble: 3 × 2
#> idpink new_val
#> <dbl> <dbl>
#> 1 7 30
#> 2 8 10
#> 3 9 17.0
blue <- select(blue, idblue)
(blue_pink_norm_true <-
calculate_area_intersection_weights(blue, pink, normalize = TRUE))
#> # A tibble: 4 × 3
#> idblue idpink w
#> <dbl> <dbl> <dbl>
#> 1 1 7 1
#> 2 2 8 1
#> 3 2 9 0.301
#> 4 3 9 0.699
# NOTE: normalize = TRUE so weights sum to 1 per target polygon.
# Non-overlap is ignored as if it does not exist.
# the following breaks down how to use these weights for one source polygon.
blue$val = c(30, 10, 20)
blue$area <- as.numeric(sf::st_area(blue))
(result <- st_drop_geometry(blue) |>
left_join(blue_pink_norm_true, by = "idblue"))
#> idblue val area idpink w
#> 1 1 30 2.5600 7 1.0000000
#> 2 2 10 2.5600 8 1.0000000
#> 3 2 10 2.5600 9 0.3012048
#> 4 3 20 3.6864 9 0.6987952
((10 * 0.3012) + (20 * 0.6988)) / ((0.3012) + (0.6988))
#> [1] 16.988
(result <- result |>
group_by(idpink) |> # group so we get one row per target
# now we calculate the value for each `pink` with fraction of the area of each
# polygon in `blue` per polygon in `pink` with an equation like this:
summarize(
new_val = sum( (val * w), na.rm = TRUE )))
#> # A tibble: 3 × 2
#> idpink new_val
#> <dbl> <dbl>
#> 1 7 30
#> 2 8 10
#> 3 9 17.0 Created on 2024-11-18 with reprex v2.1.1 |
I think @lkoenig-usgs has it cracked: #102 (comment). Here are the equations for Plain Areal Weights Interpolation: V_T: Interpolated value for the target polygon. And Normalized Areal Weights: V_T: Interpolated value for the target polygon. In either case we are normalizing by the target area. If we can get this change made to ncdfgeom and recompare where the weights are either A_iT in the normalized=False case or Ai_T/A_T in the normalized=True case. |
I would like to add at least one use-case where the weights normalized to target polygon area might not be ideal: when there is large size disparity between source and target polygons. I'm mainly working off NHD Catchments and have predominantly been weighting NHD variables to target polygons of various sizes, using both areal and length (flowline-length based) weighting. Yes HUC12, but especially large regions, like Level 3 Ecoregions, or the Van Metre Hydrologic regions used by the National IWAAs project. In these cases, NHD Catchments are a tiny fraction of the area (or total length) contained in the larger polygon. Target based weighting needs to retain a ton of precision to accurately represent these values, especially when rolling up to new interpolated target values. In contrast, source-based weighting is well represented by more basic decimals (usually 1; mean 0.993). Using target-based weighting, most COMID weights are at a precision level of the first digit being x10^-6 or x10^-7, and with subsequent digits responsible for accuracy. Maybe we don't have an issue with crosswalks necessitating that high degree of precision in values, nor are we worried about file sizes, but I do worry about the potential that precision might be lost or long weighting values truncated, and that could have a very large effect on a final rescaling product. |
Interesting nuance @mjcashman -- Precision of the cached values should be float64 IMHO. @rmcd-mscb -- let's work that into our arrangement to ensure that we capture the case where you have thousands of source polygons per target. |
I just screen shared with @rmcd-mscb for a bit 🤗 and I think we have this figured out. Handling target polygons with partial source polygon coverage is tricky for a few reasons. In the current In the current This distinction comes out in the wash because when you apply the weights with data you do:
When the weights don't sum to 1, you divide by that less than one sum and inflate The method in We get the same answer in both cases, but the weights look different! oyoyoy. I have an updated vignette and documentation in the works. Rich will pick up what I'm doing and add some specifics to gdptools to try and avoid future confusion. |
@dblodgett-usgs, I agree that it's not likely to be consequential for the end result. I'm curious what you think, then, is the biggest advantage for returning 1 in these cases? I'm not sure if my experience is representative, but intuitively, I was expecting the behavior in gdptools, such that I can infer cases where I am using partial coverage in the interpolation. |
I am going to switch the behavior here to match gdptools. Given that it doesn't matter and there is additional information in the fact that weights don't sum to 1, I think it makes sense to go with the way it was done in gdptools. Stay tuned for some way cleaner docs! :) |
update docs for area intersection weights for #102
on face value, I would think |
(I changed away from the a/b variable names and will talk about this as source / target here) This function is always returning weights to translate from source to target. When |
OK working through the example helped. TY! I shrunk down source 3 and got essentially what source 2's value is for target 9; since it's the only box overlapping the target that makes sense. cool. Now if we used gdptool's way of calculating the weight by dividing by target geometry the value for target 9 would be much smaller; it would essentially be treating the non-overlapping areas as 0 value areas. I don't think that's the type of behavior we want. unless there's a different way of calculating the values in gdptools that I am not aware of.... |
It doesn't treat that area as 0 because we normalize the weights that sum to less than 1 when calculating the weighted statistic. e.g. the complete statistic is like: (1 * .25 + 2 * .5 + NA * .25) / (.25 + .5 + .25) Note NA in there for the non-overlap is undefined so we drop that part: (1 * .25 + 2 * .5) / (.25 + .5) Since we aren't dividing by 1, we are dividing by .75, it ignores the NA and still gives you the average of the values that do intersect. |
I'm a little unsure about the usage of
normalize = TRUE
in thecalculate_area_intersection_weights
function and hoping you can help clarify. My use case is calculating the intersection weights between NHDPlusv2 catchments and NHGFv1.1 HRUs (working with Ellie W on this).Specifically, I'm wondering about using normalize = TRUE to return the fractional area of a target polygon (e.g., NHGF) that is covered by the source (e.g., NHDv2).
I assumed that for the target polygon (blue), the weights would only sum to 1 if it was fully covered by the source polygons (red), so I was surprised that the summed weights equal one in this case:
The crux of my question is, if
normalize = TRUE
, is the intended behavior that the intersected areas are divided by the total intersecting area (as seems to be the case here), or by the target area? I'd expect those two to be the same if the target areas were completely overlapping with the source areas, but not in cases like the example here.If the latter, maybe the target area could be added as a field to y, and then the
totalArea_y = unique(target_area)
?Thanks!
The text was updated successfully, but these errors were encountered: