avatarYancy Dennis

Summary

A developer has charged $1000 to create and provide Python code for scraping data on small business owners in Florida, including a training session for the client.

Abstract

The website content details a case where a Python developer was commissioned to scrape information on small business owners in Florida. The process involved extracting data from the Florida UCC Site and the Florida Business Site. The developer's service included writing the code, providing the code to the client, and agreeing to train the client on its use, which was factored into the $1000 fee. The multi-step process required downloading data files, extracting UCC Numbers, and then scraping detailed company information one UCC Number at a time. The client was particularly interested in businesses with multiple UCC filings in the past year. The developer also left the task of obtaining contact information for the owners to the client. The heart of the code used to search for business names and extract owner information is briefly showcased, with error handling for cases where businesses are not found.

Opinions

  • The developer believes that the service provided, including the code and training, justifies the $1000 charge.
  • The client's specific interest in businesses with multiple UCC filings indicates a targeted approach to their market.
  • The developer's decision to leave the acquisition of contact information to the client suggests a division of labor and expertise in the project.
  • The use of Python and web scraping techniques is presented as an effective method for data extraction in this context.
  • The developer's approach to error handling and automation reflects a practical and efficient coding style.

$1000 to Scrape Owners

Existing customers needed owners for small businesses in Florida!

Again, Python was the tool of the choice to scrape owners. Of course, you may wonder how I managed to charge $1000 for this. First, it was a multi-part process and I agreed to give him the code. So, I will have to set up a time to train him — should take about 45 minutes.

Steps:

  • Scrape the Florida UCC Site to find relevant businesses
  • Scrape the Florida Business Site to Find Owner Names
Photo by Giorgio Trovato on Unsplash

By the way, I left it up to my customer to find the contact information for these owners.

The first step was actually, a two step process. For Florida, their UCC site provides the data for the last 30 days. The first step is to download the files for each day, then extract the UCC Number. The next step is to scrape the UCC site one UCC Number at the time to get the company name and debtor name along with the date that the UCC was filed.

Also, my customer was particular interested in those companies who had filed multiple UCCs in the last year — this constitutes his target market for his business.

After I collected all that data and put it in a file, then I searched this site to get the owner information:

url = ‘https://search.sunbiz.org/Inquiry/CorporationSearch/ByName'

First, you would need to generate a list of business names which I did using my dataframe that I retrieved in step 1. Here is the heart of my code below that I wrote to search the url list above:

owners = []
for business in businesses:
    driver.get(url)
    entity = driver.find_element(By.XPATH, '//*[@id="SearchTerm"]')
    entity.clear()
    entity.send_keys(business)
    search = driver.find_element(By.XPATH, '//*[@id="search-input"]/form/div[2]/div[2]/input')
    search.click()
    
    try:
        selection = driver.find_element(By.XPATH, '//*[@id="search-results"]/table/tbody/tr[1]/td[1]/a')
        selection.click()
    except:
        print(f'{business} was not found')
        pass
try:
        Owner = driver.find_element(By.XPATH, '//*[@id="maincontent"]/div[2]').text
        owners.append([business, Owner])
    except:
        pass
    time.sleep(1)

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord. Interested in Growth Hacking? Check out Circuit.

Technology
Data Science
Programming
Artificial Intelligence
Python
Recommended from ReadMedium