Creates a new variable and assigns the output from an expression to it.
[bysort <varlist> (<varlist>):] generate <expression> [if <expression>]
The type of the variable is set to be what the expression evaluates to (and allows missing
). generate
operates row-by-row, so you can use _n
to denote the current row index and do not need to broadcast operators.
Recplaces the content of a variable, but does not change the type.
[bysort <varlist> (<varlist>):] replace <expression> [if <expression>]
replace
works like generate
in that it operates row-by-row.
- Creates a new variable, operates on vectors
[bysort <varlist> (<varlist>):] egenerate <var> = <expression> [if <expression>]
The key difference between generate
and egen
is that the latter operates on vectors, e.g. you can do by :group: egen :mymean = Statistics.mean(:var)
. That means that operations on scalars need to be broadcast (Julia's .
syntax), e.g. egen :mysum = :a .+ :b
.
egen
tries to determine the new variable's type by evaluating the expression. Types can also be set explicitly, and this is currently faster: egen :mysum::Float64 = mean(:var)
.
There is also an analogous version ereplace
(or erep
for short) that replaces values of existing variables
Drops the specified observations (if used in conjunction with if
) or variables (without if
)
drop <varlist>
[bysort <varlist> (<varlist>):] drop if <expression>
The form with if
operates row-by-row analogously to generate
, so use expessions that evaluate to a scalar Boolean variable.
There is also an analogous command, keep
, that drops all but the specified observations or variables.
Rename a variable
rename <var> <var>
The first symbol is the old variable name and needs to be an existing column in the active DataFrame, the second symbol is the new name.
Sort the rows of the active DataFrame
by the specified columns
sort <varlist>
Reshape the activate DataFrame
between wide and long format
reshape_wide <varlist> , i(<varlist>) j(<varlist>)
reshape_long <varlist> , i(<varlist>) j(<varlist>)
See the documentation in the Stata manual.
Merge the active DataFrame
with another one in the local scope
`merge_11 <varlist> using <DataFrame> , [options]`
`merge_m1 <varlist> using <DataFrame> , [options]`
`merge_1m <varlist> using <DataFrame> , [options]`
See the documentation in the Stata manual. The right-hand-side DataFrame
must be already in the local scope.
Delete duplicate rows, keeping only the first occurrence of unique rows. If <varlist>
is specified, the returned DataFrame contains complete rows, retaining in each case the first instance for which the specified set of variables is unique.
duplicates_drop [<varlist>]