colmaps - colombian maps or color maps (your choice) - is an R package with current boundaries data of Colombia plus a minimal ggplot2::geom_map()
wrapper to make choropleth maps with selected default color scales.
You can install colmaps from GitHub with devtools
:
# install.packages("devtools")
devtools::install_github("nebulae-co/colmaps")
There are two SpatialPolygonsDataFrames
provided in the package, one for municipalities and one for departments, municipios
and departamentos
respectively. You can see details of each at the documentation: ?municipios
?departamentos
.
The data frame part looks like:
library("colmaps")
head(municipios@data)
#> id id_depto municipio depto
#> 1 20001 20 Valledupar Cesar
#> 2 68217 68 Coromoro Santander
#> 3 68229 68 Curití Santander
#> 4 68235 68 El Carmen De Chucurí Santander
#> 5 68245 68 El Guacamayo Santander
#> 6 25823 25 Topaipí Cundinamarca
The wrapper around ggplot2::geom_map()
is colmap()
, it will take a map
and some data
as arguments and ggplot it with a selected color scale and a good theme for maps taken from ggmap::theme_nothing()
. Since it is build with ggplot it also returns a ggplot object so one can further alter the plot in ggplot idiom, note however that you need to explicitly load ggplot2
to add further layers and modifications.
By default the id of each area will be used as a variable:
library("ggplot2")
colmap(municipios) +
ggtitle("Colombia - Fronteras Municipales")
But if you have data of each geographical unit you can pass it, specifying how to relate the map and the data frame. For example, from the homicidios
package we have historical population, homicides and homicide rates data for municipalities:
# devtools::install_github("nebulae-co/homicidios")
library("homicidios")
head(homicidios)
#> id id_depto municipio depto año poblacion homicidios tasa
#> 1 05001 05 Medellín Antioquia 1990 1688394 3924 232.4102
#> 2 05002 05 Abejorral Antioquia 1990 24789 33 133.1236
#> 3 05004 05 Abriaquí Antioquia 1990 3201 5 156.2012
#> 4 05021 05 Alejandría Antioquia 1990 4851 7 144.3001
#> 5 05030 05 Amagá Antioquia 1990 23606 31 131.3225
#> 6 05031 05 Amalfi Antioquia 1990 18535 25 134.8800
And we can easily plot a thematic map of the data:
colmap(municipios, subset(homicidios, año == 2000), var = "tasa")
Tragically (and also ironically?!) on the year 2000, the municipality of Matanza, Santander had a homicide rate of over 1200 murders per 100.000 inhabitants, which distorts the default color scale dramatically, but we can manually re-scale the variable to be mapped into the color aesthetic, for example we can plot log(tasa + 1)
instead of tasa
and also we can override the default color scale in ggplot idiom. We will use dplyr
and the pipe operator (%>%
) from here on to handle the data with ease:
library("ggplot2")
library("dplyr")
homicidios %>%
filter(año == 2000) %>%
mutate(log_tasa = log1p(tasa)) %>%
colmap(municipios, data = ., var = "log_tasa") +
scale_fill_continuous(low = "#bcbddc", high = "#3f007d", na.value = "wheat")
Lets make the same map but by department to see other arguments:
homicidios %>%
filter(año == 2000) %>%
group_by(id_depto) %>%
summarise(tasa = 10^5 * sum(homicidios, na.rm = TRUE) /
sum(poblacion, na.rm = TRUE)) %>%
colmap(departamentos, data = ., data_id = "id_depto")
Here we note two things, var
by default is the first non data_id
column present on data so we may ommit the argument. Second we use data_id
to specify the key by which the map is to be joined.
The color scales depend on the class of the variable the map is being colored by:
-
For numeric data it is a continous scale with lower limit set on
"#fee8c8"
and high limit on"#b30000"
:ggplot2::scale_fill_continuous(low = "#fee8c8", high = "#b30000")
These were taken from the limits of the three sequential category multi-hue red pallete sugested in http://colorbrewer2.org/.
-
For factors with less than 10 levels the color scale used is:
ggplot2::scale_fill_brewer(type = "qual", palette = "Set1")`
-
For factors with
n => 10
levels the color scale used is:ggplot2::scale_fill_manual(values = rainbow(n = n, v = 0.3 * sin(seq_len(n)) + 0.7))
Which was designed by Julian (in his own way).
-
For ordered factors a manual scale that defines a discrete version of the numeric scale is used, defined as:
ggplot2::scale_fill_manual(values = colorRampPalette(c("#fee8c8", "#b30000"))(n))
Where
n
is the number of levels of the factor. -
Finally integer data uses the ordered factor scale for less than 20 different values and the numeric scale otherwise.
-
In all cases the default color for missing (
NA
) data is"grey50"
which is the default ggplot2 has forscale_fill_continous()
. Note that it can be changed with the scale as shown in the third map.
We have already seen numeric data and factors (with more than ten levels in the first plot). To ilustrate ordered factors we will make again the homicides map by municipality but first transforming the homicide rate to an ordered factor defining groups by deciles:
homicidios %>%
filter(año == 2000) %>%
mutate(tasa = round(tasa, 2),
Tasa = cut(x = tasa,
breaks = c("0%" = -Inf,
quantile(tasa, seq(0.1, 1, 0.1), na.rm = TRUE)),
ordered_result = TRUE,
dig.lab = 6)) %>%
colmap(municipios, data = ., var = "Tasa")
As shown above, with ggplot, you can add components (e.g. ggtitle
) to the plot or alter them (e.g. scale_fill_*
). A nice example of this is facetting. Here we use the year (año
) variable passed in data
to make a grid of plots, trellis style: one per level of the variable - in this case one per year, from 2008 to 2013.
gg <- homicidios %>%
filter(año >= 2008) %>%
colmap(municipios, data = ., var = "tasa")
gg + facet_wrap(~ año, nrow = 2)
(It is encouraging to see how this particular map has been getting somewhat paler with time.)
This is the result of a sub-project we started with various goals:
- learn how to make R packages (in particular data packages) and improving our
git
skills. - distribute some data we think should be easier to access in a simple way - of course this is mainly targeted for R users, including ourselves.
- enable us to do some map visualizations in a simpler way. We used this data to prepare the vignette: Un micro-mapa de Colombia which shows an example of a linked micromap with colombian data. That vizualisaton was presented in a random academic event.