How to drop columns from a Pandas DataFrame — with examples.
I use Pandas — and Python in general — for any type of scripting. Having grown to loathe redundant menial tasks, especially with CSV’s, I lean more and more on this powerful library. Since I manipulate and use them (CSV’s) daily at work, if I perform a routine 3 times, it finds its way into a Python script with pandas leading the charge. More often than not, when analyzing CSV data, they tend to be messy. Likely they have columns you don’t need or care about. Pandas DataFrames have a function that allows you to get rid of those columns and keep only the ones you need. As always, I learn best by example so keep reading and learn with me…

OS, Database, and software used:
- OpenSuse Leap 15.0
- Python 3.7.2
Self-Promotion:
If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!
Far from in shape, I still walk a lot. Typically I walk or hike at least an hour a day, 6 days a week. I track certain stats from my walks for a month in — you guessed it — a CSV file. The example CSV below contains some columns I want to keep and others I don’t. How can I get rid of them with pandas? Pretty simple.
To start, I’ll import pandas and load up the CSV in a DataFrame with the function:
>>> import pandas as pd
>>> stats = pd.read_csv(‘/home/joshua/Practice Data/Fitness_DB_Data/aug_stats.csv’, delimiter=’,’)
>>> stats.head()
day_walked cal_burned miles_walked duration mph additional_weight weight_amount trekking_poles shoe_id trail_id
0 2018–08–01 336.1 3.37 01:01:48 3.3 true 1.5 true 4 7
1 2018–08–02 355.3 3.70 01:15:14 3.0 false 0.0 false 4 4
2 2018–08–03 259.9 2.57 00:47:47 3.2 true 1.5 true 4 7
3 2018–08–05 341.2 3.37 01:02:44 3.2 true 1.5 true 4 7
4 2018–08–06 357.7 3.64 01:05:46 3.3 true 1.5 true 4 7(I’ve written many blog posts about pandas and CSV’s. I’ll have a list towards the end of the post so be sure and check those that interest you.)
Although the columns are clearly visible via the output from the head()function, pandas DataFrames do have a column attribute available you can access to see a list of them as well:
>>> stats.columns
Index([‘day_walked’, ‘cal_burned’, ‘miles_walked’, ‘duration’, ‘mph’,
‘additional_weight’, ‘weight_amount’, ‘trekking_poles’, ‘shoe_id’,
‘trail_id’],
dtype=’object’)Dropping a column is as simple as just specifying the column(s) to remove as a parameter to drop():
>>> stats.drop([‘additional_weight’], axis=1)
day_walked cal_burned miles_walked duration mph weight_amount trekking_poles shoe_id trail_id
0 2018–08–01 336.1 3.37 01:01:48 3.3 1.5 True 4 7
1 2018–08–02 355.3 3.70 01:15:14 3.0 0.0 False 4 4
2 2018–08–03 259.9 2.57 00:47:47 3.2 1.5 True 4 7
3 2018–08–05 341.2 3.37 01:02:44 3.2 1.5 True 4 7
4 2018–08–06 357.7 3.64 01:05:46 3.3 1.5 True 4 7
5 2018–08–17 184.2 1.89 00:39:00 2.9 0.0 False 4 2
6 2018–08–18 242.9 2.53 00:51:25 3.0 0.0 False 4 2
7 2018–08–30 204.4 1.95 00:37:35 3.1 0.0 False 4 5However, do not ignore the axis parameter. Check out what the DataFrame.drop() documentation says about it, along with acceptable values for it:
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).So the ‘additional_weight’ column is gone now…
>>> stats.head()
day_walked cal_burned miles_walked duration mph additional_weight weight_amount trekking_poles shoe_id trail_id
0 2018–08–01 336.1 3.37 01:01:48 3.3 True 1.5 True 4 7
1 2018–08–02 355.3 3.70 01:15:14 3.0 False 0.0 False 4 4
2 2018–08–03 259.9 2.57 00:47:47 3.2 True 1.5 True 4 7
3 2018–08–05 341.2 3.37 01:02:44 3.2 True 1.5 True 4 7
4 2018–08–06 357.7 3.64 01:05:46 3.3 True 1.5 True 4 7Huh? What gives?
Unless we reassign the current DataFrame object to another one, to see these changes, we must use the inplace=True parameter:
>>> stats.drop([‘weight_amount’], axis=1, inplace=True)
>>> stats.head()
day_walked cal_burned miles_walked duration mph weight_amount trekking_poles shoe_id trail_id
0 2018–08–01 336.1 3.37 01:01:48 3.3 1.5 True 4 7
1 2018–08–02 355.3 3.70 01:15:14 3.0 0.0 False 4 4
2 2018–08–03 259.9 2.57 00:47:47 3.2 1.5 True 4 7
3 2018–08–05 341.2 3.37 01:02:44 3.2 1.5 True 4 7
4 2018–08–06 357.7 3.64 01:05:46 3.3 1.5 True 4 7Need to drop multiple columns? No problem. Just supply them in a list and it’s a done deal:
>>> stats.drop([‘weight_amount’, ‘trekking_poles’], axis=1, inplace=True)
>>> stats.head()
day_walked cal_burned miles_walked duration mph shoe_id trail_id
0 2018–08–01 336.1 3.37 01:01:48 3.3 4 7
1 2018–08–02 355.3 3.70 01:15:14 3.0 4 4
2 2018–08–03 259.9 2.57 00:47:47 3.2 4 7
3 2018–08–05 341.2 3.37 01:02:44 3.2 4 7
4 2018–08–06 357.7 3.64 01:05:46 3.3 4 7Other posts you may be interested in: Bulk CSV Uploads with Pandas and PostgreSQL
- Pandas concat() then to_sql() — CSV upload to PostgreSQL
- Pandas to SQL — importing CSV data files into PostgreSQL
- Basic CSV file import and exploration with Pandas — first steps.
- Postgres, Python, and Psycopg2 — executemany() method CSV upload example.
- Python and psycopg2 for CSV bulk upload in PostgreSQL — with examples…
- COPY and CAST() — Bulk uploads in PostgreSQL
Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!
Originally published at https://joshuaotwell.com on September 25, 2019.
