avatarYancy Dennis

Summary

A developer efficiently resolved a client's data merging issue using Python's FuzzyWuzzy and pandas libraries, creating a system that allows for easy updates and data searching, and charges $100 for a quick monthly process.

Abstract

The author of the article describes a solution they developed for a client who needed to merge and search through monthly database files with inconsistent headers. By leveraging Python's FuzzyWuzzy library, the developer created a master list of column names and used fuzzy logic to match and rename columns in each file. After merging the data with pandas.concat(), the developer exported the results to Excel, enabling the client to filter and search the data. The process, which takes about 10 minutes each month, has been automated to the extent that the developer can charge $100 for the service via Zelle. The article hints at a future post with more detailed code, should there be interest from readers.

Opinions

  • The author expresses disapproval of the data provider's practice of changing headers in the CSV files, calling it "shameful."
  • The author views the task as a "quick easy gig" for themselves, highlighting the efficiency of their solution.
  • There is an implication that the client is satisfied with the service, as they continue to send monthly files and are willing to pay for the updates.
  • The author is open to sharing more code and insights, indicating a willingness to contribute to the community and help others with similar problems.

How I Earned $250 for an Hour of Work Using Pandas and Fuzzy Logic

Photo by Arnold Francisca on Unsplash

An existing customer contacted me about merging some database files that he receives from the Dade County government in Florida. Interestingly, the people who provide the data keep changing the headers in the CSV files that it sends to my customers monthly. I think that they are messing with him — shameful — if the data is in a database the extract should be the same every month. Nonetheless, it was a quick easy gig for me.

My client wants me to merge all the data for the last twelve months and then add a new month of data each. Then, he wants to be able to search the data. First, I have to match up the columns. For this, I use Python’s FuzzyWuzzy.

Sample Code

Using fuzz from FuzzyWuzzy, I first created a master list of column names, then I process the columns in each file and find the closest match using fuzz. I then rename the column using Python’s rename() function. I repeat his for each file and then use pandas.concat() to join the data.

Then, I export this data to Excel. Within Excel, I invoke a filter function so that my client can search his data as needed.

Every month, my client sends me a new file. I run it through this process, add the new data to the existing master file, then bill him $100 via Zelle. This takes all of 10 minutes at most.

He has another job for me. Once I finish it, I will post the basics of building it for his customer.

If you would like to see more code, then respond. Thus, if the demand is there, I will gladly share more code.

Thanks for reading!

More content at plainenglish.io

Programming
Pandas
Python
Freelancing
Coding
Recommended from ReadMedium