If my_address_file.csv
is a file in the current working directory with an address column named address
, then the DeGAUSS command:
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/geocoder:3.3.0 my_address_file.csv
will produce my_address_file_geocoder_3.3.0_score_threshold_0.5.csv
with added columns:
matched_street
,matched_city
,matched_state
,matched_zip
: matched address componets (e.g.,matched_street
is the street the geocoder matched with the input address); can be used to investigate input address misspellings, typos, etc.precision
: The method/precision of the geocode. The value will be one of:range
: interpolated based on address ranges from street segmentsstreet
: center of the matched streetintersection
: intersection of two streetszip
: centroid of the matched zip codecity
: centroid of the matched city
score
: The percentage of text match between the given address and the geocoded result, expressed as a number between 0 and 1. A higher score indicates a closer match. Note that each score is relative within a precision method (i.e. ascore
of0.8
with aprecision
ofrange
is not the same as ascore
of0.8
with aprecision
ofstreet
).lat
andlon
: geocoded coordinates for matched addressgeocode_result
: A character string summarizing the geocoding result. The value will be one ofgeocoded
: the address was geocoded with aprecision
of eitherrange
orstreet
and ascore
of0.5
or greater.imprecise_geocode
: the address was geocoded, but results were suppressed because theprecision
wasintersection
,zip
, orcity
and/or thescore
was less than0.5
.po_box
: the address was not geocoded because it is a PO Boxcincy_inst_foster_addr
: the address was not geocoded because it is a known institutional address, not a residential addressnon_address_text
: the address was not geocoded because it was blank or listed as "foreign", "verify", or "unknown"
- Geocodes with a resulting precision of
intersection
,zip
, orcity
are returned with a missinglat
andlon
because they are likely too inaccurate and/or too imprecise to be used for further analysis. - By default,
lat
andlon
are also returned as missing if thescore
is less than0.5
(regardless of the precision). - This threshold can be changed by including an optional argument in the docker call (e.g.,
docker run --rm -v $PWD:/tmp degauss/geocoder:3.2.0 my_address_file.csv 0.6
). - Supplying
all
instead of a numericscore_threshold
returns all geocodes regardless ofscore
,precision
, orpo_box
,cincy_inst_foster_addr
, andnon_address_text
filters.
- Other columns may be present, but it is recommended to only include
address
and an optional identifier column (e.g.,id
). Fewer columns will increase geocoding speed. - Address data must be in one column called
address
. - Separate the different address components with a space
- Do not include apartment numbers or "second address line" (but its okay if you can't remove them)
- ZIP codes must be five digits (i.e.
32709
) and not "plus four" (i.e.32709-0000
) - Do not try to geocode addresses without a valid 5 digit zip code; this is used by the geocoder to complete its initial searches and if attempted, it will likely return incorrect matches
- Spelling should be as accurate as possible, but the program does complete "fuzzy matching" so an exact match is not necessary
- Capitalization does not affect results
- Abbreviations may be used (i.e.
St.
instead ofStreet
orOH
instead ofOhio
) - Use Arabic numerals instead of written numbers (i.e.
13
instead ofthirteen
) - Address strings with out of order items could return NA (i.e.
3333 Burnet Ave Cincinnati 45229 OH
)
geocoder.db
is a SQL database prepared following the instructions here using 2021 TIGER/Line Street Range Address files from the Census- For this container, it is hosted at
s3://geomarker/geocoder_2021.db
For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.