-
Notifications
You must be signed in to change notification settings - Fork 76
/
14-glossary.Rmd
122 lines (105 loc) · 6.85 KB
/
14-glossary.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# Glossary
- **Algorithm**: a series of computer commands executed in a
specific order for a pre-defined purpose.
Algorithms process input data and produce outputs.
- **Constraints** are variables used to estimate the number (or weight)
of individuals in each zone. Also referred to by the longer name of
**constraint variable**. We tend to use the term **linking variable**
in this book because they *link* aggregate and individual level datasets.
- **Combinatorial optimisation** is an approach to spatial
microsimulation that generates spatial microdata by randomly
selecting individuals from a survey dataset and measuring the fit
between the simulated output and the constraint variables. If the
fit improves after any particular change, the change is kept.
Williamson (2007) provides a practical user manual. @Harland2013
provides a practical demonstration of the method implemented in
the Java-based Flexible Modelling Framework (FMF).
- **Data frame**: a type of object (formally referred to as a class)
in R, data frames are square tables composed of rows and columns of
information. As with many things in R, the best way to understand
data frames is to create them and experiment. The following creates
a data frame with two variables: name and height:
Note that each new variable is entered using the command `c()` this is
how R creates objects with the *vector* data class, a one
dimensional matrix — and that text data must be entered in quote
marks.
- **Deterministic reweighting** is an approach to generating spatial
microdata that allocates fractional weights to individuals based on
how representative they are of the target area. It differs from
combinatorial optimisation approaches in that it requires no random
numbers. The most frequently used method of deterministic
reweighting is IPF.
- **For loops** are instructions that tell the computer to run a
certain set of command repeatedly. `for(i in 1:9) print(i)`, for
example will print the value of i 9 times. The best way to further
understand for loops is to try them out.
- **Iteration**: one instance of a process that is repeated many times
until a predefined end point, often within an *algorithm*.
- **Iterative proportional fitting** (IPF): an iterative process
implemented in mathematics and algorithms to find the maximum
likelihood of cells that are constrained by multiple sets of
marginal totals. To make this abstract definition even more
confusing, there are multiple terms which refer to the process,
including ‘biproportional fitting’ and ‘matrix raking’. In plain
English, IPF in the context of spatial microsimulation can be
defined as *a statistical technique for allocating weights to
individuals depending on how representative they are of different
zones*. IPF is a type of deterministic reweighting, meaning that
random numbers are not needed to generate the result and that the
output weights are real (not integer) numbers.
- A **linking variable** is a variable that is shared between individual and
aggregate level data. Common examples include age and sex (the linking variables
used in the SimpleWorld example): questions that are commonly asked in all
kinds of survey. Linking variables are also referred to as
**constraint variables** because they *constrain* the weights for individuals
in each zone.
- **Microdata** is the non-geographical individual level dataset from which
synthetic **spatial microdata** are usually derived. This sample of the
target population has also been labelled as the 'seed'
(e.g. Barthelemy and Toint, 2012) and simply the 'survey data' in the academic
literature. The term microdata is used in this book for its brevity and
semantic link to spatial microdata.
- The **population base** roughly equivalent to the 'target population',
used by statisticians to describe the population about whom they wish to
draw conclusions based on a 'sample population'.
The sample population, is the group of individuals who
we have individual level data for.
In aggregate level data, the **population base** is the
complete set of individuals represented by the counts.
A common example is the variable "Hours worked":
only people aged 16 to 74 are generally thought of as working, so, if there is
no `NA` (no answer) category, the population base is not the same as the total
population of an area. A common problem faced by people using spatial microsimulation
methods is incompatibility between aggregate constraints that use different
population bases.
- **Population synthesis** is the process of converting input data (generally
non-geographical **microda** and geographically aggregated
**constraint variables**) into **spatial microdata**.
- **Spatial microdata** is the name given to individual level data allocated
to mutually exclusive geographical zones (see Figure 5.1 above). Spatial
microdata is useful because it provides multi level information, about the
relationships between individuals and where they live. However, due to the
high costs of large surveys and restrictions on the release of geocoded
individual level data, spatial microdata is rarely available to researchers.
To overcome this issue, most spatial microsimulation research employs methods
of **population synthesis** to generate representative spatial microdata.
- **Spatial microsimulation** is the name given to an approach to modelling that
comprises a series of techniques that
generate, analyse and model individual level data allocated to small
administrative zones. Spatial microsimulation is an approach for
understanding processes that operate on individual and geographical levels.
- A **weight matrix** is a 2 dimensional array that links non-spatial
*microdata* to geographical zones. Each row in the weight matrix represents
an individual and each column represents a zone. Thus, in R notation,
the weight matrix `w` has dimensions of `nrow(ind)` rows by `nrow(cons)`
where `ind` and `cons` are the microdata and constraints respectively.
The value of `w[i,j]` represents the extent to which individual `i` is
representative of zone `j`. `sum(w)` is the total population of the study area.
The weight matrix is an efficient way of storing spatial microdata because
it does not require a new row for every additional individual in the study
area. For a weight matrix to be converted into spatial microdata, all the
values of the wieghts must be integers. The conversion of an integer weight
matrix into an integer weight matrix is known as *integerisation*.
```{r, echo=FALSE}
# Any words that are highlighted in the main text can go in here
```