Skip to content

degauss-org/postal_expand

 
 

Repository files navigation

postal_expand

container build status

Using

If my_address_file.csv is a file in the current working directory with an address column named address, then the DeGAUSS command:

docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal_expand:0.1.0 my_address_file.csv

will produce my_address_file_postal_expand_0.1.0.csv with added columns:

  • cleaned_address: address with non-alphanumeric characterics and excess whitespace removed (with dht::clean_address())

  • expanded_addresses: the expanded addresses for cleaned_address

Addresses are be expanded into several possible normalized addresses using libpostal_expand. This can be useful for matching of these addresses with other messy, real world addresses.

Because each cleaned_address will likely result in more than one expanded_addresses, each input row is duplicated to accomodate several expanded_addresses. This means that when expanding addresses, the input CSV file is "expanded" too by duplicating the input rows.

Geomarker Methods

Input addresses are normalized using libpostal_expand by:

  1. removing non-alphanumeric characters (except -) and excess whitespace (with dht::clean_address())
  2. expanding the cleaned address into several possible normalized addresses

DeGAUSS Details

For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.

About

address normalization and parsing with libpostal

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Dockerfile 48.5%
  • R 39.5%
  • Makefile 12.0%