Skip to content

Commit

Permalink
corrections ch02
Browse files Browse the repository at this point in the history
  • Loading branch information
michaeldorman committed Oct 8, 2024
1 parent ea888f6 commit c3a9731
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 91 deletions.
8 changes: 4 additions & 4 deletions 02-attribute-operations.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -220,11 +220,11 @@ world[world['continent'].isin(['North America', 'South America'])] \

Aggregation involves summarizing data based on one or more *grouping variables* (typically values in a column; geographic aggregation is covered in @sec-vector-spatial-aggregation).
A classic example of this attribute-based aggregation is calculating the number of people per continent based on country-level data (one row per country).
The `world` dataset contains the necessary ingredients: the columns `pop` and `continent`, the population and the grouping variable, respectively.
The `world` dataset contains the necessary ingredients: the columns `pop` and `continent`, the target variable and the grouping variable, respectively.
The aim is to find the `sum()` of country populations for each continent, resulting in a smaller table or vector layer (of continents).
Since aggregation is a form of data reduction, it can be a useful early step when working with large datasets.

Attribute-based aggregation can be achieved using a combination of `.groupby` and `.sum` (package **pandas**), where the former groups the data by the grouping variable(s) and the latter calculates the sum of the specified column(s). The `.reset_index` methods moves the grouping variable into an ordinary column, rather than an index (the default), which is something we typically want to do.
Attribute-based aggregation can be achieved using a combination of `.groupby` and `.sum` (package **pandas**), where the former groups the data by the grouping variable(s) and the latter calculates the sum of the specified column(s). The `.reset_index` method moves the grouping variable into an ordinary column, rather than an index (the default), which is something we typically want to do.

```{python}
world_agg1 = world.groupby('continent')[['pop']].sum().reset_index()
Expand Down Expand Up @@ -254,7 +254,7 @@ fig, ax = plt.subplots(figsize=(6, 3))
world_agg2.plot(column='pop', edgecolor='black', legend=True, ax=ax);
```

Other options for the `aggfunc` parameter in `.dissolve` [include](https://geopandas.org/en/stable/docs/user_guide/aggregation_with_dissolve.html)`'first'`, `'last'`, `'min'`, `'max'`, `'sum'`, `'mean'`, `'median'`.
Other options for the `aggfunc` parameter in `.dissolve` [include](https://geopandas.org/en/stable/docs/user_guide/aggregation_with_dissolve.html) `'first'`, `'last'`, `'min'`, `'max'`, `'sum'`, `'mean'`, `'median'`.
Additionally, we can pass custom functions here.

As a more complex example, the following code shows how we can calculate the total population, area, and count of countries, per continent.
Expand Down Expand Up @@ -300,7 +300,7 @@ In the following code example, given the `world_agg3` continent summary (@fig-sp
- drop the geometry column,
- calculate population density of each continent,
- arrange continents by the number of countries each contains, and
- keep only the 3 most populous continents.
- keep only the 3 most country-rich continents.

```{python}
world_agg4 = world_agg3.drop(columns=['geometry'])
Expand Down
Loading

0 comments on commit c3a9731

Please sign in to comment.