-
-
Notifications
You must be signed in to change notification settings - Fork 59
/
04-geometry-operations.qmd
1406 lines (1107 loc) · 69.5 KB
/
04-geometry-operations.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
jupyter: python3
---
# Geometry operations {#sec-geometric-operations}
## Prerequisites {.unnumbered}
```{python}
#| echo: false
import book_options
```
::: {.content-visible when-format="pdf"}
```{python}
#| echo: false
import book_options_pdf
```
:::
This chapter requires importing the following packages:
```{python}
import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import shapely
import geopandas as gpd
import topojson as tp
import rasterio
import rasterio.plot
import rasterio.warp
import rasterio.mask
```
It also relies on the following data files:
```{python}
seine = gpd.read_file('data/seine.gpkg')
us_states = gpd.read_file('data/us_states.gpkg')
nz = gpd.read_file('data/nz.gpkg')
src = rasterio.open('data/dem.tif')
src_elev = rasterio.open('output/elev.tif')
```
## Introduction
So far the book has explained the structure of geographic datasets (@sec-spatial-class), and how to manipulate them based on their non-geographic attributes (@sec-attr) and spatial relations (@sec-spatial-operations).
This chapter focuses on manipulating the geographic elements of geographic objects, for example by simplifying and converting vector geometries, and by cropping raster datasets.
After reading it you should understand and have control over the geometry column in vector layers and the extent and geographic location of pixels represented in rasters in relation to other geographic objects.
@sec-geo-vec covers transforming vector geometries with 'unary' and 'binary' operations.
Unary operations work on a single geometry in isolation, including simplification (of lines and polygons), the creation of buffers and centroids, and shifting/scaling/rotating single geometries using 'affine transformations' (@sec-simplification to @sec-affine-transformations).
Binary transformations modify one geometry based on the shape of another, including clipping and geometry unions, covered in @sec-clipping and @sec-geometry-unions, respectively.
Type transformations (from a polygon to a line, for example) are demonstrated in @sec-type-transformations.
@sec-geo-ras covers geometric transformations on raster objects.
This involves changing the size and number of the underlying pixels, and assigning them new values.
It teaches how to change the extent and the origin of a raster manually (@sec-extent-and-origin), how to change the resolution in fixed steps through aggregation and disaggregation (@sec-raster-agg-disagg), and finally how to resample a raster into any existing template, which is the most general and often most practical approach (@sec-raster-resampling).
These operations are especially useful if one would like to align raster datasets from diverse sources.
Aligned raster objects share a one-to-one correspondence between pixels, allowing them to be processed using map algebra operations (@sec-raster-local-operations).
In the next chapter (@sec-raster-vector), we deal with the special case of geometry operations that involve both a raster and a vector layer together.
It shows how raster values can be 'masked' and 'extracted' by vector geometries.
Importantly it shows how to 'polygonize' rasters and 'rasterize' vector datasets, making the two data models more interchangeable.
## Geometric operations on vector data {#sec-geo-vec}
This section is about operations that in some way change the geometry of vector layers.
It is more advanced than the spatial data operations presented in the previous chapter (in @sec-spatial-vec), because here we drill down into the geometry: the functions discussed in this section work on the geometric part (the geometry column, which is a `GeoSeries` object), either as standalone object or as part of a `GeoDataFrame`.
### Simplification {#sec-simplification}
Simplification is a process for generalization of vector objects (lines and polygons) usually for use in smaller-scale maps.
Another reason for simplifying objects is to reduce the amount of memory, disk space, and network bandwidth they consume: it may be wise to simplify complex geometries before publishing them as interactive maps.
The **geopandas** package provides the `.simplify` method, which uses the GEOS implementation of the Douglas-Peucker algorithm to reduce the vertex count.
`.simplify` uses `tolerance` to control the level of generalization in map units [@douglas_algorithms_1973].
For example, a simplified geometry of a `'LineString'` geometry, representing the river Seine and tributaries, using tolerance of `2000` meters, can be created using the `seine.simplify(2000)` command (@fig-simplify-lines).
```{python}
#| label: fig-simplify-lines
#| fig-cap: Simplification of the `seine` line layer
#| layout-ncol: 2
#| fig-subcap:
#| - Original
#| - Simplified (tolerance = 2000 $m$)
seine_simp = seine.simplify(2000)
seine.plot();
seine_simp.plot();
```
The resulting `seine_simp` object is a copy of the original `seine` but with fewer vertices.
This is apparent, with the result being visually simpler (@fig-simplify-lines, right) and consuming about twice less memory than the original object, as shown in the comparison below.
```{python}
print(f'Original: {sys.getsizeof(seine)} bytes')
print(f'Simplified: {sys.getsizeof(seine_simp)} bytes')
```
Simplification is also applicable for polygons.
This is illustrated using `us_states`, representing the contiguous United States.
As we show in @sec-reproj-geo-data, for many calculations **geopandas** (through **shapely**, and, ultimately, GEOS) assumes that the data is in a projected CRS and this could lead to unexpected results when applying distance-related operators.
Therefore, the first step is to project the data into some adequate projected CRS, such as US National Atlas Equal Area (EPSG:`9311`) (on the left in Figure @fig-simplify-polygons), using `.to_crs` (@sec-reprojecting-vector-geometries).
```{python}
us_states9311 = us_states.to_crs(9311)
```
The `.simplify` method from **geopandas** works the same way with a `'Polygon'`/`'MultiPolygon'` layer such as `us_states9311`:
```{python}
us_states_simp1 = us_states9311.simplify(100000)
```
A limitation with `.simplify`, however, is that it simplifies objects on a per-geometry basis.
This means the topology is lost, resulting in overlapping and 'holey' areal units as illustrated in @fig-simplify-polygons (b).
The `.toposimplify` method from package **topojson** provides an alternative that overcomes this issue.
The main advanatage of `.toposimplify` is that it is topologically 'aware': it simplifies the combined borders of the polygons (rather than each polygon on its own), thus ensuring that the overlap is maintained.
The following code chunk uses `.toposimplify` to simplify `us_states9311`.
Note that, when using the **topojson** package, we first need to calculate a topology object, using function `tp.Topology`, and then apply the simplification function, such as `.toposimplify`, to obtain a simplified layer.
We are also using the `.to_gdf` method to return a `GeoDataFrame`.
```{python}
#| warning: false
topo = tp.Topology(us_states9311, prequantize=False)
us_states_simp2 = topo.toposimplify(100000).to_gdf()
```
@fig-simplify-polygons compares the original input polygons and two simplification methods applied to `us_states9311`.
```{python}
#| label: fig-simplify-polygons
#| fig-cap: Polygon simplification in action, comparing the original geometry of the contiguous United States with simplified versions, generated with functions from the **geopandas** (middle), and **topojson** (right), packages.
#| layout-ncol: 3
#| fig-subcap:
#| - Original
#| - Simplified using **geopandas**
#| - Simplified using **topojson**
us_states9311.plot(color='lightgrey', edgecolor='black');
us_states_simp1.plot(color='lightgrey', edgecolor='black');
us_states_simp2.plot(color='lightgrey', edgecolor='black');
```
### Centroids {#sec-centroids}
Centroid operations identify the center of geographic objects.
Like statistical measures of central tendency (including mean and median definitions of 'average'), there are many ways to define the geographic center of an object.
All of them create single-point representations of more complex vector objects.
The most commonly used centroid operation is the geographic centroid.
This type of centroid operation (often referred to as 'the centroid') represents the center of mass in a spatial object (think of balancing a plate on your finger).
Geographic centroids have many uses, for example to create a simple point representation of complex geometries, to estimate distances between polygons, or to specify the location where polygon text labels are placed.
Centroids of the geometries in a `GeoSeries` or a `GeoDataFrame` are accessible through the `.centroid` property, as demonstrated in the code below, which generates the geographic centroids of regions in New Zealand and tributaries to the River Seine (black points in @fig-centroid-pnt-on-surface).
```{python}
nz_centroid = nz.centroid
seine_centroid = seine.centroid
```
Sometimes the geographic centroid falls outside the boundaries of their parent objects (think of vector data in shape of a doughnut).
In such cases 'point on surface' operations, created with the `.representative_point` method, can be used to guarantee the point will be in the parent object (e.g., for labeling irregular multipolygon objects such as island states), as illustrated by the red points in @fig-centroid-pnt-on-surface.
Notice that these red points always lie on their parent objects.
```{python}
nz_pos = nz.representative_point()
seine_pos = seine.representative_point()
```
The centroids and points on surface are illustrated in @fig-centroid-pnt-on-surface.
```{python}
#| label: fig-centroid-pnt-on-surface
#| fig-cap: Centroids (black) and points on surface (red) of New Zealand and Seine datasets.
#| layout-ncol: 2
#| fig-subcap:
#| - New Zealand
#| - Seine
# New Zealand
base = nz.plot(color='white', edgecolor='lightgrey')
nz_centroid.plot(ax=base, color='None', edgecolor='black')
nz_pos.plot(ax=base, color='None', edgecolor='red');
# Seine
base = seine.plot(color='grey')
seine_pos.plot(ax=base, color='None', edgecolor='red')
seine_centroid.plot(ax=base, color='None', edgecolor='black');
```
### Buffers {#sec-buffers}
Buffers are polygons representing the area within a given distance of a geometric feature: regardless of whether the input is a point, line or polygon, the output is a polygon (when using positive buffer distance).
Unlike simplification, which is often used for visualization and reducing file size, buffering tends to be used for geographic data analysis.
How many points are within a given distance of this line?
Which demographic groups are within travel distance of this new shop?
These kinds of questions can be answered and visualized by creating buffers around the geographic entities of interest.
@fig-buffers illustrates buffers of two different sizes (5 and 50 $km$) surrounding the river Seine and tributaries.
These buffers were created with commands below, using the `.buffer` method, applied to a `GeoSeries` or `GeoDataFrame`.
The `.buffer` method requires one important argument: the buffer distance, provided in the units of the CRS, in this case, meters.
```{python}
seine_buff_5km = seine.buffer(5000)
seine_buff_50km = seine.buffer(50000)
```
The results are shown in @fig-buffers.
```{python}
#| label: fig-buffers
#| fig-cap: Buffers around the Seine dataset of 5 $km$ and 50 $km$. Note the colors, which reflect the fact that one buffer is created per geometry feature.
#| layout-ncol: 2
#| fig-subcap:
#| - 5 $km$ buffer
#| - 50 $km$ buffer
seine_buff_5km.plot(color='none', edgecolor=['c', 'm', 'y']);
seine_buff_50km.plot(color='none', edgecolor=['c', 'm', 'y']);
```
Note that both `.centroid` and `.buffer` return a `GeoSeries` object, even when the input is a `GeoDataFrame`.
```{python}
seine_buff_5km
```
In the common scenario when the original attributes of the input features need to be retained, you can replace the existing geometry with the new `GeoSeries` by creating a copy of the original `GeoDataFrame` and assigning the new buffer `GeoSeries` to the `geometry` column.
```{python}
seine_buff_5km = seine.copy()
seine_buff_5km.geometry = seine.buffer(5000)
seine_buff_5km
```
An alternative option is to add a secondary geometry column directly to the original `GeoDataFrame`.
```{python}
seine['geometry_5km'] = seine.buffer(5000)
seine
```
You can then switch to either geometry column (i.e., make it 'active') using `.set_geometry`, as in:
```{python}
seine = seine.set_geometry('geometry_5km')
```
Let's revert to the original state of `seine` before moving on to the next section.
```{python}
seine = seine.set_geometry('geometry')
seine = seine.drop('geometry_5km', axis=1)
```
### Affine transformations {#sec-affine-transformations}
Affine transformations include, among others, shifting (translation), scaling and rotation, or any combination of these.
They preserves lines and parallelism, but angles and lengths are not necessarily preserved.
These transformations are an essential part of geocomputation.
For example, shifting is needed for labels placement, scaling is used in non-contiguous area cartograms, and many affine transformations are applied when reprojecting or improving the geometry that was created based on a distorted or wrongly projected map.
The **geopandas** package implements affine transformation, for objects of classes `GeoSeries` and `GeoDataFrame`.
In both cases, the method is applied on the `GeoSeries` part, returning just the `GeoSeries` of transformed geometries.
Affine transformations of `GeoSeries` can be done using the `.affine_transform` method, which is a wrapper around the `shapely.affinity.affine_transform` function.
A two-dimensional affine transformation requires a six-parameter list `[a,b,d,e,xoff,yoff]` which represents @eq-affine1 and @eq-affine2 for transforming the coordinates.
$$
x' = a x + b y + x_\mathrm{off}
$$ {#eq-affine1}
$$
y' = d x + e y + y_\mathrm{off}
$$ {#eq-affine2}
There are also simplified `GeoSeries` methods for specific scenarios, such as:
- `.translate(xoff=0.0, yoff=0.0)`
- `.scale(xfact=1.0, yfact=1.0, origin='center')`
- `.rotate(angle, origin='center', use_radians=False)`
For example, *shifting* only requires the $x_{off}$ and $y_{off}$, using `.translate`.
The code below shifts the y-coordinates of `nz` by 100 $km$ to the north, but leaves the x-coordinates untouched.
```{python}
nz_shift = nz.translate(0, 100000)
nz_shift
```
::: callout-note
**shapely**, and consequently **geopandas**, operations, typically ignore the z-dimension (if there is one) of geometries in operations. For example, `shapely.LineString([(0,0,0),(0,0,1)]).length` returns `0` (and not `1`), since `.length` ignores the z-dimension. This is not an issue in this book (and in most real-world spatial analysis applications), since we are dealing only with two-dimensional geometries.
:::
Scaling enlarges or shrinks objects by a factor, and can be applied either globally or locally.
Global scaling increases or decreases all coordinates values in relation to the origin coordinates, while keeping all geometries topological relations intact.
**geopandas** implements scaling using the `.scale` method.
Local scaling treats geometries independently and requires points around which geometries are going to be scaled, e.g., centroids.
In the example below, each geometry is shrunk by a factor of two around the centroids (@fig-affine-transformations (b)).
To achieve that, we pass the `0.5` and `0.5` scaling factors (for x and y, respectively), and the `'centroid'` option for the point of origin.
```{python}
nz_scale = nz.scale(0.5, 0.5, origin='centroid')
nz_scale
```
When setting the `origin` in `.scale`, other than `'centroid'` it is possible to use `'center'`, for the bounding box center, or specific point coordinates, such as `(0,0)`.
Rotating the geometries can be done using the `.rotate` method.
When rotating, we need to specify the rotation angle (positive values imply clockwise rotation) and the `origin` points (using the same options as in `.scale`).
For example, the following expression rotates `nz` by $30\degree$ counter-clockwise, around the geometry centroids.
```{python}
nz_rotate = nz.rotate(-30, origin='centroid')
nz_rotate
```
@fig-affine-transformations shows the original layer `nz`, and the shifting, scaling, and rotation results.
```{python}
#| label: fig-affine-transformations
#| fig-cap: 'Affine transformations of the `nz` layer: shift, scale, and rotate'
#| layout-ncol: 3
#| fig-subcap:
#| - Shift
#| - Scale
#| - Rotate
# Shift
base = nz.plot(color='lightgrey', edgecolor='darkgrey')
nz_shift.plot(ax=base, color='red', edgecolor='darkgrey');
# Scale
base = nz.plot(color='lightgrey', edgecolor='darkgrey')
nz_scale.plot(ax=base, color='red', edgecolor='darkgrey');
# Rotate
base = nz.plot(color='lightgrey', edgecolor='darkgrey')
nz_rotate.plot(ax=base, color='red', edgecolor='darkgrey');
```
### Pairwise geometry-generating operations {#sec-clipping}
Spatial clipping is a form of spatial subsetting that involves changes to the geometry columns of at least some of the affected features.
Clipping can only apply to features more complex than points: lines, polygons, and their 'multi' equivalents.
To illustrate the concept we will start with a simple example: two overlapping circles with a center point one unit away from each other and a radius of one (@fig-overlapping-circles).
```{python}
#| label: fig-overlapping-circles
#| fig-cap: Overlapping polygon (circle) geometries `x` and `y`
x = shapely.Point((0, 0)).buffer(1)
y = shapely.Point((1, 0)).buffer(1)
shapely.GeometryCollection([x, y])
```
Imagine you want to select not one circle or the other, but the space covered by both `x` and `y`.
This can be done using the `.intersection` method from **shapely**, illustrated using objects named `x` and `y` which represent the left- and right-hand circles (@fig-intersection).
```{python}
#| label: fig-intersection
#| fig-cap: Intersection between `x` and `y`
x.intersection(y)
```
More generally, clipping is an example of a 'pairwise geometry-generating operation', where new geometries are generated from two inputs.
Other than `.intersection` (@fig-intersection), there are three other standard pairwise operators: `.difference` (@fig-difference), `.union` (@fig-union), and `.symmetric_difference` (@fig-symmetric-difference).
```{python}
#| label: fig-difference
#| fig-cap: Difference between `x` and `y` (namely, `x` 'minus' `y`)
x.difference(y)
```
```{python}
#| label: fig-union
#| fig-cap: Union of `x` and `y`
x.union(y)
```
```{python}
#| label: fig-symmetric-difference
#| fig-cap: Symmetric difference between `x` and `y`
x.symmetric_difference(y)
```
Keep in mind that `x` and `y` are interchangeable in all predicates except for `.difference`, where `x.difference(y)` means `x` minus `y`, whereas `y.difference(x)` means `y` minus `x`.
The latter examples demonstrate pairwise operations between individual `shapely` geometries.
The **geopandas** package, as is often the case, contains wrappers of these **shapely** functions to be applied to multiple, or pairwise, use cases.
For example, applying either of the pairwise methods on a `GeoSeries` or `GeoDataFrame`, combined with a `shapely` geometry, returns the pairwise (many-to-one) results (which is analogous to other operators, like `.intersects` or `.distance`, see @sec-spatial-subsetting-vector and @sec-distance-relations, respectively).
Let's demonstrate the 'many-to-one' scenario by calculating the difference between each geometry in a `GeoSeries` and a fixed `shapely` geometry.
To create the latter, let's take `x` and combine it with itself translated (@sec-affine-transformations) to a distance of `1` and `2` units 'upwards' on the y-axis.
```{python}
geom1 = gpd.GeoSeries(x)
geom2 = geom1.translate(0, 1)
geom3 = geom1.translate(0, 2)
geom = pd.concat([geom1, geom2, geom3])
geom
```
@fig-geom-intersection shows the `GeoSeries` `geom` with the `shapely` geometry (in red) that we will intersect with it.
```{python}
#| label: fig-geom-intersection
#| fig-cap: A `GeoSeries` with three circles (in grey), and a `shapely` geometry that we will subtract from it (in red)
fig, ax = plt.subplots()
geom.plot(color='#00000030', edgecolor='black', ax=ax)
gpd.GeoSeries(y).plot(color='#FF000040', edgecolor='black', ax=ax);
```
Now, using `.intersection` automatically applies the **shapely** method of the same name on each geometry in `geom`, returning a new `GeoSeries`, which we name `geom_inter_y`, with the pairwise intersections.
Note the empty third geometry (can you explain the meaning of this result?).
```{python}
geom_inter_y = geom.intersection(y)
geom_inter_y
```
@fig-geom-intersection2 is a plot of the result `geom_inter_y`.
```{python}
#| label: fig-geom-intersection2
#| fig-cap: The output `GeoSeries`, after subtracting a `shapely` geometry using `.intersection`
geom_inter_y.plot(color='#00000030', edgecolor='black');
```
The `.overlay` method (see @sec-joining-incongruent-layers) further extends this technique, making it possible to apply 'many-to-many' pairwise geometry generations between all pairs of two `GeoDataFrame`s.
The output is a new `GeoDataFrame` with the pairwise outputs, plus the attributes of both inputs which were the inputs of the particular pairwise output geometry.
Also see the *Set operations with overlay*[^set_ops_w_overlay] article in the **geopandas** documentation for examples of `.overlay`.
[^set_ops_w_overlay]: [https://geopandas.org/en/stable/docs/user_guide/set_operations.html](https://geopandas.org/en/stable/docs/user_guide/set_operations.html)
### Subsetting vs. clipping {#sec-subsetting-vs-clipping}
In the last two chapters we have introduced two types of spatial operators: boolean, such as `.intersects` (@sec-spatial-subsetting-vector), and geometry-generating, such as `.intersection` (@sec-clipping).
Here, we illustrate the difference between them.
We do this using the specific scenario of subsetting points by polygons, where (unlike in other cases) both methods can be used for the same purpose and giving the same result.
To illustrate the point, we will subset points that cover the bounding box of the circles `x` and `y` from @fig-overlapping-circles.
Some points will be inside just one circle, some will be inside both, and some will be inside neither.
The following code sections generate the sample data for this section, a simple random distribution of points within the extent of circles `x` and `y`, resulting in output illustrated in @fig-random-points.
We create the sample points in two steps.
First, we figure out the bounds where random points are to be generated.
```{python}
bounds = x.union(y).bounds
bounds
```
Second, we use `np.random.uniform` to calculate `n` random x- and y-coordinates within the given bounds.
```{python}
np.random.seed(1)
n = 10
coords_x = np.random.uniform(bounds[0], bounds[2], n)
coords_y = np.random.uniform(bounds[1], bounds[3], n)
coords = list(zip(coords_x, coords_y))
coords
```
Third, we transform the list of coordinates into a `list` of `shapely` points, and then to a `GeoSeries`.
```{python}
pnt = [shapely.Point(i) for i in coords]
pnt = gpd.GeoSeries(pnt)
```
The result `pnt`, with `x` and `y` circles in the background, is shown in @fig-random-points.
```{python}
#| label: fig-random-points
#| fig-cap: Randomly distributed points within the bounding box enclosing circles `x` and `y`
base = pnt.plot(color='none', edgecolor='black')
gpd.GeoSeries(x).plot(ax=base, color='none', edgecolor='darkgrey');
gpd.GeoSeries(y).plot(ax=base, color='none', edgecolor='darkgrey');
```
Now, we can get back to our question: how to subset the points to only return the points that intersect with both `x` and `y`?
The code chunks below demonstrate two ways to achieve the same result.
In the first approach, we can calculate a boolean `Series`, evaluating whether each point of `pnt` intersects with the intersection of `x` and `y` (see @sec-spatial-subsetting-vector), and then use it to subset `pnt` to get the result `pnt1`.
```{python}
sel = pnt.intersects(x.intersection(y))
pnt1 = pnt[sel]
pnt1
```
In the second approach, we can also find the intersection between the input points represented by `pnt`, using the intersection of `x` and `y` as the subsetting/clipping object.
Since the second argument is an individual `shapely` geometry (`x.intersection(y)`), we get 'pairwise' intersections of each `pnt` with it (see @sec-clipping):
```{python}
pnt2 = pnt.intersection(x.intersection(y))
pnt2
```
The subset `pnt2` is shown in @fig-intersection-points.
```{python}
#| label: fig-intersection-points
#| fig-cap: Randomly distributed points within the bounding box enclosing circles `x` and `y`. The points that intersect with both objects `x` and `y` are highlighted.
base = pnt.plot(color='none', edgecolor='black')
gpd.GeoSeries(x).plot(ax=base, color='none', edgecolor='darkgrey');
gpd.GeoSeries(y).plot(ax=base, color='none', edgecolor='darkgrey');
pnt2.plot(ax=base, color='red');
```
The only difference between the two approaches is that `.intersection` returns all intersections, even if they are empty.
When these are filtered out, `pnt2` becomes identical to `pnt1`:
```{python}
pnt2 = pnt2[~pnt2.is_empty]
pnt2
```
The example above is rather contrived and provided for educational rather than applied purposes.
However, we encourage the reader to reproduce the results to deepen your understanding of handling geographic vector objects in Python.
### Geometry unions {#sec-geometry-unions}
Spatial aggregation can silently dissolve the geometries of touching polygons in the same group, as we saw in @sec-vector-attribute-aggregation.
This is demonstrated in the code chunk below, in which 49 `us_states` are aggregated into 4 regions using the `.dissolve` method.
```{python}
regions = us_states[['REGION', 'geometry', 'total_pop_15']] \
.dissolve(by='REGION', aggfunc='sum').reset_index()
regions
```
@fig-dissolve compares the original `us_states` layer with the aggregated `regions` layer.
```{python}
#| label: fig-dissolve
#| fig-cap: 'Spatial aggregation on contiguous polygons, illustrated by aggregating the population of 49 US states into 4 regions, with population represented by color. Note the operation automatically dissolves boundaries between states.'
#| layout-ncol: 2
#| fig-subcap:
#| - 49 States
#| - 4 Regions
# States
fig, ax = plt.subplots(figsize=(9, 2.5))
us_states.plot(ax=ax, edgecolor='black', column='total_pop_15', legend=True);
# Regions
fig, ax = plt.subplots(figsize=(9, 2.5))
regions.plot(ax=ax, edgecolor='black', column='total_pop_15', legend=True);
```
What is happening with the geometries here?
Behind the scenes, `.dissolve` combines the geometries and dissolves the boundaries between them using the `.union_all` method per group.
This is demonstrated in the code chunk below which creates a united western US using the standalone `.union_all` operation.
Note that the result is a `shapely` geometry, as the individual attributes are 'lost' as part of dissolving (@fig-dissolve2).
```{python}
#| label: fig-dissolve2
#| fig-cap: Western US
us_west = us_states[us_states['REGION'] == 'West']
us_west_union = us_west.geometry.union_all()
us_west_union
```
To dissolve two (or more) groups of a `GeoDataFrame` into one geometry, we can either (a) use a combined condition or (b) concatenate the two separate subsets and then dissolve using `.union_all`.
```{python}
# Approach 1
sel = (us_states['REGION'] == 'West') | (us_states['NAME'] == 'Texas')
texas_union = us_states[sel]
texas_union = texas_union.geometry.union_all()
# Approach 2
us_west = us_states[us_states['REGION'] == 'West']
texas = us_states[us_states['NAME'] == 'Texas']
texas_union = pd.concat([us_west, texas]).union_all()
```
The result is identical in both cases, shown in @fig-dissolve3.
```{python}
#| label: fig-dissolve3
#| fig-cap: Western US and Texas
texas_union
```
### Type transformations {#sec-type-transformations}
Transformation of geometries, from one type to another, also known as 'geometry casting', is often required to facilitate spatial analysis.
Either the **geopandas** or the **shapely** packages can be used for geometry casting, depending on the type of transformation, and the way that the input is organized (whether as individual geometry, or a vector layer).
Therefore, the exact expression(s) depend on the specific transformation we are interested in.
In general, you need to figure out the required input of the respective constructor function according to the 'destination' geometry (e.g., `shapely.LineString`, etc.), then reshape the input of the source geometry into the right form to be passed to that function.
Or, when available, you can use a wrapper from **geopandas**.
In this section, we demonstrate several common scenarios.
We start with transformations of individual geometries from one type to another, using **shapely** methods:
* `'MultiPoint'` to `'LineString'` (@fig-type-transform-linestring)
* `'MultiPoint'` to `'Polygon'` (@fig-type-transform-polygon)
* `'LineString'` to `'MultiPoint'` (@fig-type-transform-multipoint2)
* `'Polygon'` to `'MultiPoint'` (@fig-type-transform-polygon2)
* `'Polygon'`s to `'MultiPolygon'` (@fig-type-transform-multipolygon)
* `'MultiPolygon'`s to `'Polygon'`s (@fig-type-transform-multipolygon1, @fig-type-transform-multipolygon2)
Then, we move on and demonstrate casting workflows on `GeoDataFrame`s, where we have further considerations, such as keeping track of geometry attributes, and the possibility of dissolving, rather than just combining, geometries. As we will see, these are done either by manually applying **shapely** methods on all geometries in the given layer, or using **geopandas** wrapper methods which do it automatically:
* `'MultiLineString'` to `'LineString'`s (using `.explode`) (@fig-multilinestring-to-linestring)
* `'LineString'` to `'MultiPoint'`s (using `.apply`) (@fig-linestring-to-multipoint)
* `'LineString'`s to `'MultiLineString'` (using `.dissolve`)
* `'Polygon'`s to `'MultiPolygon'` (using `.dissolve` or `.agg`) (@fig-combine-geoms)
* `'Polygon'` to `'(Multi)LineString'` (using `.boundary` or `.exterior`) (demonstrated in a subsequent chapter, see @sec-rasterizing-lines-and-polygons)
Let's start with the simple individual-geometry casting examples, to illustrate how geometry casting works on **shapely** geometry objects.
First, let's create a `'MultiPoint'` (@fig-type-transform-multipoint).
```{python}
#| label: fig-type-transform-multipoint
#| fig-cap: A `'MultiPoint'` geometry used to demonstrate **shapely** type transformations
multipoint = shapely.MultiPoint([(1,1), (3,3), (5,1)])
multipoint
```
A `'LineString'` can be created using `shapely.LineString` from a `list` of points.
Thus, a `'MultiPoint'` can be converted to a `'LineString'` by passing the points into a `list`, then passing them to `shapely.LineString` (@fig-type-transform-linestring).
The `.geoms` property, mentioned in @sec-geometries, gives access to the individual parts that comprise a multi-part geometry, as an iterable object similar to a `list`; it is one of the **shapely** access methods to internal parts of a geometry.
```{python}
#| label: fig-type-transform-linestring
#| fig-cap: A `'LineString'` created from the `'MultiPoint'` in @fig-type-transform-multipoint
linestring = shapely.LineString(multipoint.geoms)
linestring
```
Similarly, a `'Polygon'` can be created using function `shapely.Polygon`, which accepts a sequence of point coordinates.
In principle, the last coordinate must be equal to the first, in order to form a closed shape.
However, `shapely.Polygon` is able to complete the last coordinate automatically, and therefore we can pass all of the coordinates of the `'MultiPoint'` directly to `shapely.Polygon` (@fig-type-transform-polygon).
```{python}
#| label: fig-type-transform-polygon
#| fig-cap: A `'Polygon'` created from the `'MultiPoint'` in @fig-type-transform-multipoint
polygon = shapely.Polygon(multipoint.geoms)
polygon
```
The source `'MultiPoint'` geometry, and the derived `'LineString'` and `'Polygon'` geometries are shown in @fig-casting1.
Note that we convert the `shapely` geometries to `GeoSeries` to be able to use the **geopandas** `.plot` method.
```{python}
#| label: fig-casting1
#| fig-cap: Examples of `'LineString`' and `'Polygon'` casted from a `'MultiPoint'` geometry
#| layout-ncol: 3
#| fig-subcap:
#| - "`'MultiPoint'`"
#| - "`'LineString'`"
#| - "`'Polygon'`"
gpd.GeoSeries(multipoint).plot();
gpd.GeoSeries(linestring).plot();
gpd.GeoSeries(polygon).plot();
```
Conversion from `'MultiPoint'` to `'LineString'`, shown above (@fig-type-transform-linestring), is a common operation that creates a line object from ordered point observations, such as GPS measurements or geotagged media.
This allows spatial operations, such as calculating the length of the path traveled.
Conversion from `'MultiPoint'` or `'LineString'` to `'Polygon'` (@fig-type-transform-polygon) is often used to calculate an area, for example from the set of GPS measurements taken around a lake or from the corners of a building lot.
Our `'LineString'` geometry can be converted back to a `'MultiPoint'` geometry by passing its coordinates directly to `shapely.MultiPoint` (@fig-type-transform-multipoint2).
```{python}
#| label: fig-type-transform-multipoint2
#| fig-cap: A `'MultiPoint'` created from the `'LineString'` in @fig-type-transform-linestring
shapely.MultiPoint(linestring.coords)
```
A `'Polygon'` (exterior) coordinates can be passed to `shapely.MultiPoint`, to go back to a `'MultiPoint'` geometry, as well (@fig-type-transform-polygon2).
```{python}
#| label: fig-type-transform-polygon2
#| fig-cap: A `'MultiPoint'` created from the `'Polygon'` in @fig-type-transform-polygon
shapely.MultiPoint(polygon.exterior.coords)
```
Using these methods, we can transform between `'Point'`, `'LineString'`, and `'Polygon'` geometries, assuming there is a sufficient number of points (at least two for a line, and at least three for a polygon).
When dealing with multi-part geometries using **shapely**, we can:
- Access single-part geometries (e.g., each `'Polygion'` in a `'MultiPolygon'` geometry) using `.geoms[i]`, where `i` is the index of the geometry
- Combine single-part geometries into a multi-part geometry, by passing a `list` of the latter to the constructor function
For example, here is how we combine two `'Polygon'` geometries into a `'MultiPolygon'` (while also using a **shapely** affine function `shapely.affinity.translate`, which is underlying the **geopandas** `.translate` method used earlier, see @sec-affine-transformations) (@fig-type-transform-multipolygon):
```{python}
#| label: fig-type-transform-multipolygon
#| fig-cap: A `'MultiPolygon'` created from the `'Polygon'` in @fig-type-transform-polygon and another polygon
multipolygon = shapely.MultiPolygon([
polygon,
shapely.affinity.translate(polygon.centroid.buffer(1.5), 3, 2)
])
multipolygon
```
Given `multipolygon`, here is how we can get back the `'Polygon'` part 1 (@fig-type-transform-multipolygon1):
```{python}
#| label: fig-type-transform-multipolygon1
#| fig-cap: The 1^st^ part extracted from the `'MultiPolygon'` in @fig-type-transform-multipolygon
multipolygon.geoms[0]
```
and part 2 (@fig-type-transform-multipolygon2):
```{python}
#| label: fig-type-transform-multipolygon2
#| fig-cap: The 2^nd^ part extracted from the `'MultiPolygon'` in @fig-type-transform-multipolygon
multipolygon.geoms[1]
```
However, dealing with multi-part geometries can be easier with **geopandas**. Thanks to the fact that geometries in a `GeoDataFrame` are associated with attributes, we can keep track of the origin of each geometry: duplicating the attributes when going from multi-part to single-part (using `.explode`, see below), or 'collapsing' the attributes through aggregation when going from single-part to multi-part (using `.dissolve`, see @sec-geometry-unions).
Let's demonstrate going from multi-part to single-part (@fig-multilinestring-to-linestring) and then back to multi-part (@sec-geometry-unions), using a small line layer.
As input, we will create a `'MultiLineString'` geometry composed of three lines (@fig-type-transform-multilinestring3).
```{python}
#| label: fig-type-transform-multilinestring3
#| fig-cap: A `'MultiLineString'` geometry composed of three lines
l1 = shapely.LineString([(1, 5), (4, 3)])
l2 = shapely.LineString([(4, 4), (4, 1)])
l3 = shapely.LineString([(2, 2), (4, 2)])
ml = shapely.MultiLineString([l1, l2, l3])
ml
```
Let's place it into a `GeoSeries`.
```{python}
geom = gpd.GeoSeries(ml)
geom
```
Then, put it into a `GeoDataFrame` with an attribute called `'id'`:
```{python}
dat = gpd.GeoDataFrame(geometry=geom, data=pd.DataFrame({'id': [1]}))
dat
```
You can imagine it as a road or river network.
The above layer `dat` has only one row that defines all the lines.
This restricts the number of operations that can be done, for example, it prevents adding names to each line segment or calculating lengths of single lines.
Using **shapely** methods with which we are already familiar with (see above), the individual single-part geometries (i.e., the 'parts') can be accessed through the `.geoms` property.
```{python}
list(ml.geoms)
```
However, specifically for the 'multi-part to single part' type transformation scenario, there is also a method called `.explode`, which can convert an entire multi-part `GeoDataFrame` to a single-part one.
The advantage is that the original attributes (such as `id`) are retained, so that we can keep track of the original multi-part geometry properties that each part came from.
The `index_parts=True` argument also lets us keep track of the original multipart geometry indices, and part indices, named `level_0` and `level_1`, respectively.
```{python}
#| warning: false
dat1 = dat.explode(index_parts=True).reset_index()
dat1
```
For example, here we see that all `'LineString'` geometries came from the same multi-part geometry (`level_0`=`0`), which had three parts (`level_1`=`0`,`1`,`2`).
@fig-multilinestring-to-linestring demonstrates the effect of `.explode` in converting a layer with multi-part geometries into a layer with single-part geometries.
```{python}
#| label: fig-multilinestring-to-linestring
#| fig-cap: Transformation of a `'MultiLineString'` layer with one feature, into a `'LineString'` layer with three features, using `.explode`
#| layout-ncol: 2
#| fig-subcap:
#| - "`'MultiLineString'` layer"
#| - "`'LineString'` layer, after applying `.explode`"
dat.plot(column='id', linewidth=7);
dat1.plot(column='level_1', linewidth=7);
```
As a side-note, let's demonstrate how the above **shapely** casting methods can be translated to **geopandas**.
Suppose that we want to transform `dat1`, which is a layer of type `'LineString'` with three features, to a layer of type `'MultiPoint'` (also with three features).
Recall that for a single geometry, we use the expression `shapely.MultiPoint(x.coords)`, where `x` is a `'LineString'` (@fig-type-transform-multipoint2).
When dealing with a `GeoDataFrame`, we wrap the conversion into `.apply`, to apply it to all geometries:
```{python}
dat2 = dat1.copy()
dat2.geometry = dat2.geometry.apply(lambda x: shapely.MultiPoint(x.coords))
dat2
```
The result is illustrated in @fig-linestring-to-multipoint.
```{python}
#| label: fig-linestring-to-multipoint
#| fig-cap: Transformation of a `'LineString'` layer with three features, into a `'MultiPoint'` layer (also with three features), using `.apply` and **shapely** methods
#| layout-ncol: 2
#| fig-subcap:
#| - "`'LineString'` layer"
#| - "`'MultiPoint'` layer"
dat1.plot(column='level_1', linewidth=7);
dat2.plot(column='level_1', markersize=50);
```
The opposite transformation, i.e., 'single-part to multi-part', is achieved using the `.dissolve` method (which we are already familiar with, see @sec-geometry-unions).
For example, here is how we can get from the `'LineString'` layer with three features back to the `'MultiLineString'` layer with one feature (since, in this case, there is just one group):
```{python}
dat1.dissolve(by='id').reset_index()
```
The next code chunk is another example, dissolving the 16 polygons in `nz` into two geometries of the north and south parts (i.e., the two `'Island'` groups).
```{python}
nz_dis1 = nz[['Island', 'Population', 'geometry']] \
.dissolve(by='Island', aggfunc='sum') \
.reset_index()
nz_dis1
```
Note that `.dissolve` not only combines single-part into multi-part geometries, but also dissolves any internal borders.
So, in fact, the resulting geometries may be single-part (in case when all parts touch each other, unlike in `nz`).
If, for some reason, we want to combine geometries into multi-part *without* dissolving, we can fall back to the **pandas** `.agg` method (custom table aggregation), supplemented with a **shapely** function specifying how exactly we want to transform each group of geometries into a new single geometry.
In the following example, for instance, we collect all `'Polygon'` and `'MultiPolygon'` parts of `nz` into a single `'MultiPolygon'` geometry with many separate parts (i.e., without dissolving), per group.
```{python}
#| warning: false
nz_dis2 = nz \
.groupby('Island') \
.agg({
'Population': 'sum',
'geometry': lambda x: shapely.MultiPolygon(x.explode().to_list())
}) \
.reset_index()
nz_dis2 = gpd.GeoDataFrame(nz_dis2).set_geometry('geometry').set_crs(nz.crs)
nz_dis2
```
The difference between the last two results `nz_dis1` and `nz_dis2` (with and without dissolving, respectively) is not evident in the printout: in both cases we got a layer with two features of type `'MultiPolygon'`.
However, in the first case internal borders were dissolved, while in the second case they were not.
This is illustrated in @fig-combine-geoms:
```{python}
#| label: fig-combine-geoms
#| fig-cap: Combining New Zealand geometries into one, for each island, with and without dissolving
#| layout-ncol: 2
#| fig-subcap:
#| - Dissolving (using the **geopandas** `.dissolve` method)
#| - Combining into multi-part without dissolving (using `.agg` and a custom **shapely**-based function)
nz_dis1.plot(color='lightgrey', edgecolor='black');
nz_dis2.plot(color='lightgrey', edgecolor='black');
```
It is also worthwhile to note the `.boundary` and `.exterior` properties of `GeoSeries`, which are used to cast polygons to lines, with or without interior rings, respectively (see @sec-rasterizing-lines-and-polygons).
## Geometric operations on raster data {#sec-geo-ras}
Geometric raster operations include the shift, flipping, mirroring, scaling, rotation, or warping of images.
These operations are necessary for a variety of applications including georeferencing, used to allow images to be overlaid on an accurate map with a known CRS [@liu_essential_2009].
A variety of georeferencing techniques exist, including:
* Georectification based on known ground control points
* Orthorectification, which also accounts for local topography
* Image registration is used to combine images of the same thing but shot from different sensors, by aligning one image with another (in terms of coordinate system and resolution)
Python is rather unsuitable for the first two points since these often require manual intervention which is why they are usually done with the help of dedicated GIS software.
On the other hand, aligning several images is possible in Python and this section shows among others how to do so.
This often includes changing the extent, the resolution, and the origin of an image.
A matching projection is of course also required but is already covered in @sec-reprojecting-raster-geometries.
In any case, there are other reasons to perform a geometric operation on a single raster image.
For instance, a common reason for aggregating a raster is to decrease run-time or save disk space.
Of course, this approach is only recommended if the task at hand allows a coarser resolution of raster data.
<!-- ### Geometric intersections {#sec-raster-geometric-intersections} -->
<!-- jn: Michael, what is the difference between this section and the section about cropping and masking in the next chapter? -->
<!-- In geocompr, the difference is clean -- the first section is about cliping a raster with another raster; the second section is about cropping a raster with a vector. -->
<!-- Here, the difference is not clear to me. -->
<!-- If there is no difference, maybe we should either rewrite this section or remove it. -->
<!-- md: that's correct, the only difference is that here the cropping geometry comes from another raster, however in 'rasterio' there is no distinct way to crop using a raster, so indeed most of the workflow is the same. I agree this section mostly repeats the same material, it's fine with me to remove it -->
<!-- In @sec-spatial-subsetting-raster we have shown how to extract values from a raster overlaid by coordinates or by a matching boolean mask. -->
<!-- A different case is when the area of interest is defined by any general (possibly non-matching) raster B, to retrieve a spatial output of a (smaller) subset of raster A we can: -->
<!-- - Extract the bounding box polygon of B (hereby, `clip`) -->
<!-- - Mask and crop A (hereby, `elev.tif`) using B (@sec-raster-cropping) -->
<!-- For example, suppose that we want to get a subset of the `elev.tif` raster using another, smaller, raster. -->
<!-- To demonstrate this, let's create (see @sec-raster-from-scratch) that smaller raster, hereby named `clip`. -->
<!-- First, we need to create a $3 \times 3$ array of raster values. -->
<!-- clip = np.array([1] * 9).reshape(3, 3) -->
<!-- clip -->
<!-- Then, we define the transformation matrix, in such a way that `clip` intersects with `elev.tif` (@fig-raster-intersection). -->
<!-- new_transform = rasterio.transform.from_origin(west=0.9, north=0.45, xsize=0.3, ysize=0.3) -->
<!-- new_transform -->
<!-- Now, for subsetting, we will derive a `shapely` geometry representing the `clip` raster extent, using [`rasterio.transform.array_bounds`](https://rasterio.readthedocs.io/en/latest/api/rasterio.transform.html#rasterio.transform.array_bounds). -->
<!-- bbox = rasterio.transform.array_bounds(clip.shape[1], clip.shape[0], new_transform) -->
<!-- bbox -->
<!-- The four numeric values can be transformed into a rectangular `shapely` geometry using `shapely.box` (@fig-raster-clip-bbox). -->
<!-- #| label: fig-raster-clip-bbox -->
<!-- #| fig-cap: '`shapely` geometry derived from a clipping raster bounding box coordinates, a preliminary step for geometric intersection between two rasters' -->
<!-- bbox = shapely.box(*bbox) -->
<!-- bbox -->
<!-- @fig-raster-intersection shows the alignment of `bbox` and `elev.tif`. -->
<!-- #| label: fig-raster-intersection -->
<!-- #| fig-cap: The `elev.tif` raster, and the extent of another (smaller) raster `clip` which we use to subset it -->
<!-- fig, ax = plt.subplots() -->
<!-- rasterio.plot.show(src_elev, ax=ax) -->
<!-- gpd.GeoSeries([bbox]).plot(color='none', ax=ax); -->
<!-- From here on, subsetting can be done using masking and cropping, just like with any vector layer other than `bbox`, regardless whether it is rectangular or not. -->
<!-- We elaborate on masking and cropping in @sec-raster-cropping (check that section for details about `rasterio.mask.mask`), but, for completeness, here is the code for the last step of masking and cropping: -->
<!-- out_image, out_transform = rasterio.mask.mask(src_elev, [bbox], crop=True, all_touched=True, nodata=0) -->
<!-- The resulting subset array `out_image` contains all pixels intersecting with `clip` *pixels* (not necessarily with the centroids!). -->
<!-- However, due to the `all_touched=True` argument, those pixels which intersect with `clip`, but their centroid does not, retain their original values (e.g., `17`, `23`) rather than turned into "No Data" (e.g., `0`). -->
<!-- out_image -->
<!-- Therefore, in our case, subset `out_image` dimensions are $2 \times 2$ (@fig-raster-intersection2; also see @fig-raster-intersection). -->
<!-- #| label: fig-raster-intersection2 -->
<!-- #| fig-cap: The resulting subset of the `elev.tif` raster -->
<!-- fig, ax = plt.subplots() -->
<!-- rasterio.plot.show(out_image, transform=out_transform, ax=ax) -->
<!-- gpd.GeoSeries([bbox]).plot(color='none', ax=ax); -->
### Extent and origin {#sec-extent-and-origin}
When merging or performing map algebra on rasters, their resolution, projection, origin, and/or extent have to match.
Otherwise, how should we add the values of one raster with a resolution of `0.2` decimal degrees to a second raster with a resolution of `1` decimal degree?
The same problem arises when we would like to merge satellite imagery from different sensors with different projections and resolutions.
We can deal with such mismatches by aligning the rasters.
Typically, raster alignment is done through resampling---that way, it is guaranteed that the rasters match exactly (@sec-raster-resampling).
However, sometimes it can be useful to modify raster placement and extent manually, by adding or removing rows and columns, or by modifying the origin, that is, slightly shifting the raster.
Sometimes, there are reasons other than alignment with a second raster for manually modifying raster extent and placement.
For example, it may be useful to add extra rows and columns to a raster prior to focal operations, so that it is easier to operate on the edges.
Let's demostrate the first operation, raster padding.
First, we will read the array with the `elev.tif` values:
```{python}
r = src_elev.read(1)
r
```
To pad an `ndarray`, we can use the `np.pad` function.
The function accepts an array, and a tuple of the form `((rows_top,rows_bottom),(columns_left, columns_right))`.
Also, we can specify the value that's being used for padding with `constant_values` (e.g., `18`).
For example, here we pad `r` with one extra row and two extra columns, on both sides, resulting in the array `r_pad`:
```{python}
rows = 1
cols = 2
r_pad = np.pad(r, ((rows,rows),(cols,cols)), constant_values=18)
r_pad
```
However, for `r_pad` to be used in any spatial operation, we also have to update its transformation matrix.
Whenever we add extra columns on the left, or extra rows on top, the raster *origin* changes.
To reflect this fact, we have to take to 'original' origin and add the required multiple of pixel widths or heights (i.e., raster resolution steps).
The transformation matrix of a raster is accessible from the raster file metadata (@sec-raster-from-scratch) or, as a shortcut, through the `.transform` property of the raster file connection.
For example, the next code chunk shows the transformation matrix of `elev.tif`.
```{python}
src_elev.transform
```
From the transformation matrix, we are able to extract the origin.
```{python}
xmin, ymax = src_elev.transform[2], src_elev.transform[5]
xmin, ymax
```
We can also get the resolution of the data, which is the distance between two adjacent pixels.
```{python}
dx, dy = src_elev.transform[0], src_elev.transform[4]
dx, dy
```
These two parts of information are enough to calculate the new origin (`xmin_new,ymax_new`) of the padded raster.
```{python}
xmin_new = xmin - dx * cols
ymax_new = ymax - dy * rows
xmin_new, ymax_new
```
Using the updated origin, we can update the transformation matrix (@sec-raster-from-scratch).
Keep in mind that the meaning of the last two arguments is `xsize`, `ysize`, so we need to pass the absolute value of `dy` (since it is negative).