Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added assign_centroids for impact objects #602

Merged
merged 36 commits into from
Mar 23, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
8457f63
added assign_centroids for impact objects
timschmi95 Dec 9, 2022
8d264ff
use df instead of gdf
timschmi95 Dec 9, 2022
d538e09
avoid unnecessary gpd import
timschmi95 Dec 9, 2022
aed16e6
remove whitespace
timschmi95 Dec 12, 2022
e297873
few suggestions on docstring
aleeciu Dec 13, 2022
4b6e300
Merge branch 'develop' into feature/assign_centroids_impact
timschmi95 Dec 15, 2022
b0435de
added test for imp.assign_centroids
timschmi95 Dec 15, 2022
b1661f4
update docstrings
timschmi95 Jan 13, 2023
1d33588
Merge branch 'develop' into feature/assign_centroids_impact
timschmi95 Jan 16, 2023
b55bf46
remove whitespace
timschmi95 Jan 16, 2023
6e892e2
Merge branch 'develop' into feature/assign_centroids_impact
timschmi95 Feb 1, 2023
d785270
pass gdf instead of df
timschmi95 Feb 1, 2023
c998e90
move overwrite to exp.assign_centroids()
timschmi95 Feb 1, 2023
a63f78a
restructuring
timschmi95 Feb 8, 2023
5734eca
Merge branch 'develop' into feature/assign_centroids_impact
timschmi95 Feb 8, 2023
9a2bb0d
pass centroids only
timschmi95 Feb 8, 2023
94e8896
test equal_crs in exp.assign_centroids()
timschmi95 Feb 8, 2023
eb4fc93
access imp.coord_df correctly
timschmi95 Feb 8, 2023
c891778
whitespace cosmetics
timschmi95 Feb 20, 2023
59c9273
Merge branch 'develop' into feature/assign_centroids_impact
timschmi95 Feb 20, 2023
1168952
minor adjustements
timschmi95 Feb 23, 2023
726835d
minor change requests from Emanuel
timschmi95 Mar 6, 2023
5805644
added additional tests
timschmi95 Mar 7, 2023
77ee027
do not set additional imp attribute anymore
timschmi95 Mar 7, 2023
2417b61
use AssertRaises in Test
timschmi95 Mar 15, 2023
8fd3ffe
remove test_crs flag
timschmi95 Mar 15, 2023
3d7a5b7
Merge branch 'develop' into feature/assign_centroids_impact
emanuel-schmid Mar 16, 2023
8ce4147
white space cosmetics
emanuel-schmid Mar 16, 2023
820d7a0
white space cosmetics
emanuel-schmid Mar 16, 2023
6199a29
remove requirement for epsg:4326
timschmi95 Mar 20, 2023
6af7cc8
Changelog update and Docstring suggestions from Emanuel
timschmi95 Mar 22, 2023
3738d0b
rename assign_centroids_to_gdf -> match_centroids
emanuel-schmid Mar 22, 2023
6e00258
rename assign_grid_points -> match_grid_points, assign_coordinates ->…
emanuel-schmid Mar 22, 2023
7ffa825
test_impact: impact.assign_centroids has been renamed
emanuel-schmid Mar 23, 2023
bd2d61c
CHANGELOG, test_impact: adapt to new names
emanuel-schmid Mar 23, 2023
48eb26d
impact: remove unused import
emanuel-schmid Mar 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions climada/engine/impact.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
import pandas as pd
import xlsxwriter
from tqdm import tqdm
import geopandas as gpd
import h5py

from climada.entity import Exposures, Tag
Expand Down Expand Up @@ -1736,6 +1737,48 @@ def stack_attribute(attr_name: str) -> np.ndarray:
**kwargs,
)

def assign_centroids(self, hazard, distance='euclidean',
threshold=u_coord.NEAREST_NEIGHBOR_THRESHOLD):
"""
Assign for each impact coordinate closest hazard coordinate.
Adds a new attribute to the impact object, ...

Uses ``u_coord.assign_centroids_to_gdf()``. See there for details and parameters

Parameters
----------
hazard : Hazard
Hazard to match (with raster or vector centroids).
distance : str, optional
Distance to use in case of vector centroids.
Possible values are "euclidean", "haversine" and "approx".
Default: "euclidean"
threshold : float
If the distance (in km) to the nearest neighbor exceeds `threshold`,
the index `-1` is assigned.
Set `threshold` to 0, to disable nearest neighbor matching.
Default: 100 (km)
"""

haz_type = hazard.tag.haz_type
centr_haz = 'centr_' + haz_type

#Assert that the imp crs is epsg:4326, as it is required by the u_coord methods
if not u_coord.equal_crs(self.crs, 'EPSG:4326'):
raise ValueError('Set Impact to lat/lon crs (EPSG:4326)!')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question (sorry 😁) - which u_coord methods require epsg:4326?
And why can't they speak for themselves, i.e. raise an exception if the crs is wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in there as afterwards I label the coordinate column 'latitude' and 'longitude', but you are right: as with exposures the functionalities also work with other CRS, it's just the column labels 'latitude'/'longitude' that are not correct then, but this shouldn't be a problem. I think we can savely remove this part.


#create geodataframe from coordinates
coord_df = pd.DataFrame(self.coord_exp, columns=['latitude', 'longitude'])
geometry = gpd.points_from_xy(coord_df.longitude,coord_df.latitude,crs = self.crs)
coord_gdf = gpd.GeoDataFrame(coord_df,geometry = geometry)

#call the assign_centroids_to_gdf util function
coord_gdf[centr_haz] = u_coord.assign_centroids_to_gdf(coord_gdf, hazard.centroids,
distance=distance,
threshold=threshold)

#return array of matched impact coordinates with hazard centroids
return coord_df[centr_haz]

@dataclass
class ImpactFreqCurve():
Expand Down
15 changes: 15 additions & 0 deletions climada/engine/test/test_impact.py
Original file line number Diff line number Diff line change
Expand Up @@ -816,6 +816,20 @@ def test__exp_build_event(self):
self.assertEqual(exp.value_unit, imp.unit)
self.assertEqual(exp.ref_year, 0)

class TestAssignCentroids(unittest.TestCase):

def test_assign_centroids(self):
"Test that hazard centroids get assigned correctly"
exp = ENT.exposures
exp.assign_centroids(HAZ)
fake_eai_exp = np.arange(len(exp.gdf))
fake_at_event = np.arange(HAZ.size)
fake_aai_agg = np.sum(fake_eai_exp)
imp = Impact.from_eih(exp, ENT.impact_funcs, HAZ,
fake_at_event, fake_eai_exp, fake_aai_agg)
imp_centr = imp.assign_centroids(HAZ)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
imp_centr = imp.assign_centroids(HAZ)
imp_centr = imp.assigned_centroids(HAZ)

np.testing.assert_array_equal(imp_centr,exp.gdf.centr_TC)


class TestImpactH5IO(unittest.TestCase):
"""Tests for reading and writing Impact from/to H5 files"""
Expand Down Expand Up @@ -1033,4 +1047,5 @@ def write_tag(group, tag_kwds):
TESTS.addTests(unittest.TestLoader().loadTestsFromTestCase(TestImpact))
TESTS.addTests(unittest.TestLoader().loadTestsFromTestCase(TestImpactH5IO))
TESTS.addTests(unittest.TestLoader().loadTestsFromTestCase(TestImpactConcat))
TESTS.addTests(unittest.TestLoader().loadTestsFromTestCase(TestAssignCentroids))
unittest.TextTestRunner(verbosity=2).run(TESTS)
28 changes: 16 additions & 12 deletions climada/entity/exposures/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,11 @@ def assign_centroids(self, hazard, distance='euclidean',
threshold=u_coord.NEAREST_NEIGHBOR_THRESHOLD,
overwrite=True):
"""Assign for each exposure coordinate closest hazard coordinate.
The Exposures ``gdf`` will be altered by this method. It will have an additional
(or modified) column named ``centr_[hazard.HAZ_TYPE]`` after the call.

Uses the utility function ``u_coord.assign_centroids_to_gdf``. See there for details
and parameters.

The value -1 is used for distances larger than ``threshold`` in point distances.
In case of raster hazards the value -1 is used for centroids outside of the raster.
Expand All @@ -392,9 +397,10 @@ def assign_centroids(self, hazard, distance='euclidean',

See Also
--------
climada.util.coordinates.assign_coordinates
climada.util.coordinates.assign_grid_points: method to associate centroids to
exposure points when centroids is a raster
climada.util.coordinates.assign_coordinates:
method to associate centroids to exposure points

Notes
-----
The default order of use is:
Expand Down Expand Up @@ -422,18 +428,16 @@ def assign_centroids(self, hazard, distance='euclidean',

LOGGER.info('Matching %s exposures with %s centroids.',
str(self.gdf.shape[0]), str(hazard.centroids.size))

if not u_coord.equal_crs(self.crs, hazard.centroids.crs):
raise ValueError('Set hazard and exposure to same CRS first!')
if hazard.centroids.meta:
assigned = u_coord.assign_grid_points(
self.gdf.longitude.values, self.gdf.latitude.values,
hazard.centroids.meta['width'], hazard.centroids.meta['height'],
hazard.centroids.meta['transform'])
else:
assigned = u_coord.assign_coordinates(
np.stack([self.gdf.latitude.values, self.gdf.longitude.values], axis=1),
hazard.centroids.coord, distance=distance, threshold=threshold)
self.gdf[centr_haz] = assigned
# Note: equal_crs is tested here, rather than within assign_centroids_to_gdf(),
# because exp.gdf.crs may not be defined, but exp.crs must be defined.

assigned_centr = u_coord.assign_centroids_to_gdf(self.gdf, hazard.centroids,
distance=distance, threshold=threshold)
self.gdf[centr_haz] = assigned_centr


def set_geometry_points(self, scheduler=None):
"""Set geometry attribute of GeoDataFrame with Points from latitude and
Expand Down
64 changes: 64 additions & 0 deletions climada/util/coordinates.py
Original file line number Diff line number Diff line change
Expand Up @@ -1057,6 +1057,70 @@ def assign_coordinates(coords, coords_to_assign, distance="euclidean",
coords_to_assign, coords[not_assigned_idx_mask], threshold, **kwargs)
return assigned_idx

def assign_centroids_to_gdf(coord_gdf, centroids, distance='euclidean',
threshold=NEAREST_NEIGHBOR_THRESHOLD):
"""Assign to each gdf coordinate point its closest centroids's coordinate.
If disatances > threshold in points' distances, -1 is returned.
If centroids are in a raster and coordinate point is outside of it ``-1`` is assigned

Parameters
----------
coord_gdf : gpd.GeoDataFrame
GeoDataframe with defined latitude/longitude column and crs
centroids : Centroids
(Hazard) centroids to match (as raster or vector centroids).
distance : str, optional
Distance to use in case of vector centroids.
Possible values are "euclidean", "haversine" and "approx".
Default: "euclidean"
threshold : float, optional
If the distance (in km) to the nearest neighbor exceeds `threshold`,
the index `-1` is assigned.
Set `threshold` to 0, to disable nearest neighbor matching.
Default: ``NEAREST_NEIGHBOR_THRESHOLD`` (100km)

See Also
--------
climada.util.coordinates.assign_grid_points: method to associate centroids to
coordinate points when centroids is a raster
climada.util.coordinates.assign_coordinates: method to associate centroids to
coordinate points
Notes
-----
The default order of use is:

1. if centroid raster is defined, assign the closest raster point
to each gdf coordinate point.
2. if no raster, assign centroids to the nearest neighbor using
euclidian metric

Both cases can introduce innacuracies for coordinates in lat/lon
coordinates as distances in degrees differ from distances in meters
on the Earth surface, in particular for higher latitude and distances
larger than 100km. If more accuracy is needed, please use 'haversine'
distance metric. This however is slower for (quasi-)gridded data,
and works only for non-gridded data.
"""

try:
if not equal_crs(coord_gdf.crs, centroids.crs):
raise ValueError('Set hazard and GeoDataFrame to same CRS first!')
except AttributeError:
# If the coord_gdf has no crs defined (or no valid geometry column),
# no error is raised and it is assumed that the user set the crs correctly
pass

if centroids.meta:
assigned = assign_grid_points(
coord_gdf.longitude.values, coord_gdf.latitude.values,
centroids.meta['width'], centroids.meta['height'],
centroids.meta['transform'])
else:
assigned = assign_coordinates(
np.stack([coord_gdf.latitude.values, coord_gdf.longitude.values], axis=1),
centroids.coord, distance=distance, threshold=threshold)
return assigned

@numba.njit
def _dist_sqr_approx(lats1, lons1, cos_lats1, lats2, lons2):
"""Compute squared equirectangular approximation distance. Values need
Expand Down
98 changes: 98 additions & 0 deletions climada/util/test/test_coordinates.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from pathlib import Path

from cartopy.io import shapereader
import pandas as pd
import geopandas as gpd
import numpy as np
from pyproj.crs import CRS as PCRS
Expand All @@ -36,6 +37,7 @@

from climada import CONFIG
from climada.util.constants import HAZ_DEMO_FL, DEF_CRS, ONE_LAT_KM, DEMO_DIR
from climada.hazard.base import Centroids
import climada.util.coordinates as u_coord

DATA_DIR = CONFIG.util.test_data.dir()
Expand Down Expand Up @@ -515,6 +517,102 @@ def test_assign_grid_points(self):
out = u_coord.assign_grid_points(xy[:, 0], xy[:, 1], width, height, transform)
np.testing.assert_array_equal(out, [120, -1])

def test_assign_centroids_to_gdf(self):
"""Test assign_centroids_to_gdf function."""
#Test 1: Raster data
meta = {
'count': 1, 'crs': DEF_CRS,
'width': 20, 'height': 10,
'transform': rasterio.Affine(1.5, 0.0, -20, 0.0, -1.4, 8)
}
centroids=Centroids(meta=meta)
df = pd.DataFrame({
'longitude': np.array([
-20.1, -20.0, -19.8, -19.0, -18.6, -18.4,
-19.0, -19.0, -19.0, -19.0,
-20.1, 0.0, 10.1, 10.1, 10.1, 0.0, -20.2, -20.3,
-6.4, 9.8, 0.0,
]),
'latitude': np.array([
7.3, 7.3, 7.3, 7.3, 7.3, 7.3,
8.1, 7.9, 6.7, 6.5,
8.1, 8.2, 8.3, 0.0, -6.1, -6.2, -6.3, 0.0,
-1.9, -1.7, 0.0,
])
})
gdf = gpd.GeoDataFrame(df,geometry=gpd.points_from_xy(df.longitude, df.latitude),
crs = DEF_CRS)
assigned = u_coord.assign_centroids_to_gdf(gdf,centroids)

expected_result = [
# constant y-value, varying x-value
-1, 0, 0, 0, 0, 1,
# constant x-value, varying y-value
-1, 0, 0, 20,
# out of bounds: topleft, top, topright, right, bottomright, bottom, bottomleft, left
-1, -1, -1, -1, -1, -1, -1, -1,
# some explicit points within the raster
149, 139, 113,
]
np.testing.assert_array_equal(assigned,expected_result)

# Test 2: Vector data (copied from test_assign_coordinates)
# note that the coordinates are in lat/lon
gdf_coords = np.array([(0.2, 2), (0, 0), (0, 2), (2.1, 3), (1, 1), (-1, 1), (0, 179.9)])
df = pd.DataFrame({
'longitude': gdf_coords[:, 1],
'latitude': gdf_coords[:, 0]
})
gdf = gpd.GeoDataFrame(df,geometry=gpd.points_from_xy(df.longitude, df.latitude),
crs = DEF_CRS)

coords_to_assign = np.array([(2.1, 3), (0, 0), (0, 2), (0.9, 1.0), (0, -179.9)])
centroids = Centroids(
lat=coords_to_assign[:, 0],
lon=coords_to_assign[:, 1],
geometry = gpd.GeoSeries(crs=DEF_CRS)
)
centroids_empty = Centroids()

expected_results = [
# test with different thresholds (in km)
(100, [2, 1, 2, 0, 3, -1, 4]),
(20, [-1, 1, 2, 0, 3, -1, -1]),
(0, [-1, 1, 2, 0, -1, -1, -1]),
]

for distance in ["euclidean", "haversine", "approx"]:
for thresh, result in expected_results:
assigned = u_coord.assign_centroids_to_gdf(
gdf, centroids, distance=distance, threshold=thresh)
np.testing.assert_array_equal(assigned, result)

#test empty centroids
result = [-1, -1, -1, -1, -1, -1, -1]
assigned_idx = u_coord.assign_centroids_to_gdf(
gdf, centroids_empty, distance=distance, threshold=thresh)
np.testing.assert_array_equal(assigned_idx, result)

# Test 3: non matching crs
df = pd.DataFrame({
'longitude': [10, 20, 30],
'latitude': [50, 60, 70]
})
gdf = gpd.GeoDataFrame(df,geometry=gpd.points_from_xy(df.longitude, df.latitude),
crs = 'EPSG:4326')

coords_to_assign = np.array([(2.1, 3), (0, 0), (0, 2), (0.9, 1.0), (0, -179.9)])
centroids = Centroids(
lat=[1100000,1200000],
lon=[2500000,2600000],
geometry = gpd.GeoSeries(crs='EPSG:2056')
)

with self.assertRaises(ValueError) as cm:
u_coord.assign_centroids_to_gdf(gdf, centroids)
self.assertIn('Set hazard and GeoDataFrame to same CRS first!',
str(cm.exception))

def test_dist_sqr_approx_pass(self):
"""Test approximate distance helper function."""
lats1 = 45.5
Expand Down