Parsing Addresses with Python
Consider using usaddress to parse addresses with Python
usaddress
is a Python library for parsing and labeling United States addresses. It can recognize and classify the different parts of an address, such as the street number, street name, city, state, and ZIP code, and return the address in a standardized format.
To use usaddress
, you will need to install the library and its dependencies using pip
or another package manager. Then, you can import the usaddress
module and use the parse()
function to parse an address.
Here is an example of how you can use usaddress
to parse and label an address in Python:
import usaddress
# Parse and label an address
address = "123 Main St, Anytown, USA 12345"
parsed_address = usaddress.parse(address)
# Print the parsed address
print(parsed_address)
This code will parse the address “123 Main St, Anytown, USA 12345” and print the result, which will be a dictionary with the different parts of the address as keys and the labels for those parts as values. The output will look something like this:
{'AddressNumber': '123', 'StreetName': 'Main', 'StreetNamePostType':
'St', 'PlaceName': 'Anytown', 'StateName': 'USA', 'ZipCode': '12345'}
You can access the individual parts of the address by using the keys of the dictionary, such as parsed_address['AddressNumber']
or parsed_address['StateName']
. You can also use the tag()
function to label the address parts in a different format, such as a list of tuples or a string.
Here is an example of how you can use the tag()
function to label the address parts in a different format:
import usaddress
# Parse and label an address
address = "123 Main St, Anytown, USA 12345"
parsed_address = usaddress.parse(address)
# Label the address parts as a list of tuples
tagged_address = usaddress.tag(parsed_address)
# Print the tagged address
print(tagged_address)
This code will parse the address and label the parts using the tag()
function, which will return the address as a list of tuples. The output will look something like this:
[('123', 'AddressNumber'), ('Main', 'StreetName'),
('St', 'StreetNamePostType'), ('Anytown', 'PlaceName'),
('USA', 'StateName'), ('12345', 'ZipCode')]
You can use the usaddress
library to parse and label a variety of different address formats, including addresses with apartment or suite numbers, directional prefixes or suffixes, and street types. You can also customize the behavior of the library by setting various options, such as the abbreviation style or the handling of ambiguous addresses.
More content at PlainEnglish.io.
Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.
Looking to scale your software startup? Check out Circuit.