If my_address_file.csv
is a file in the current working directory with an address column named address
, then the DeGAUSS command:
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal_expand:0.1.0 my_address_file.csv
will produce my_address_file_postal_expand_0.1.0.csv
with added columns:
-
cleaned_address
:address
with non-alphanumeric characterics and excess whitespace removed (withdht::clean_address()
) -
expanded_addresses
: the expanded addresses forcleaned_address
Addresses are be expanded into several possible normalized addresses using libpostal_expand
. This can be useful for matching of these addresses with other messy, real world addresses.
Because each cleaned_address
will likely result in more than one expanded_addresses
, each input row is duplicated to accomodate several expanded_addresses
. This means that when expanding addresses, the input CSV file is "expanded" too by duplicating the input rows.
Input addresses are normalized using libpostal_expand
by:
- removing non-alphanumeric characters (except
-
) and excess whitespace (withdht::clean_address()
) - expanding the cleaned address into several possible normalized addresses
For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.