How To Group By Columns With Missing Values in Pandas
Incorporating null values whilst grouping by in pandas columns

Performing aggregations with the use of group by clauses is probably something you come across in your day-to-day work. Pandas in Python is no exception to this since this is an operation you will definitely see in many different places of a repository utilising the library.
However, pandas’ default behaviour excludes empty/missing (aka null) values from the results. In today’s short tutorial we will be demonstrating this default behaviour as well as a way for incorporating missing values in the resulting aggregations.
First, let’s create an example
import numpy as np
import pandas as pd
df = pd.DataFrame(
[
(1, 'B', 121, 10.1, True),
(2, 'C', 145, 5.5, False),
(3, 'A', 345, 4.5, False),
(4, 'A', 112, np.nan, True),
(5, 'C', 105, 2.1, False),
(6, np.nan, 435, 7.8, True),
(7, np.nan, 521, np.nan, True),
(8, 'B', 322, 8.7, True),
(9, 'C', 213, 5.8, True),
(10, 'B', 718, 9.1, False),
],
columns=['colA', 'colB', 'colC', 'colD', 'colE']
)
print(df)
colA colB colC colD colE
0 1 B 121 10.1 True
1 2 C 145 5.5 False
2 3 A 345 4.5 False
3 4 A 112 NaN True
4 5 C 105 2.1 False
5 6 NaN 435 7.8 True
6 7 NaN 521 NaN True
7 8 B 322 8.7 True
8 9 C 213 5.8 True
9 10 B 718 9.1 FalseThe default behaviour
Now let’s suppose that we want to compute the sum per value in colB. That’s easy enough and can be done with the following expression
df.groupby('colB')['colD'].sum()that is going to return the following result:
>>> df.groupby('colB')['colD'].sum()
colB
A 4.5
B 27.9
C 13.4
Name: colD, dtype: float64But as we can notice, the missing values (None) are missing from the output.
Incorporating null values in aggregations
There’s a chance you may still want to incorporate missing values in your aggregations. In order to do so, all you need to do is explicitly specify dropna=False when calling the groupby function — this value defaults to True. Note that this is possible for pandas versions ≥ 1.1.
df.groupby('colB', dropna=False)['colD'].sum()And the resulting Series will also include the count for missing values:
>>> df.groupby('colB', dropna=False)['colD'].sum()
colB
A 4.5
B 27.9
C 13.4
NaN 7.8
Name: colD, dtype: float64Obviously, the same concept applies for other aggregation types, such as count. For example,
>>> df.groupby('colB').count()
colA colC colD colE
colB
A 2 2 1 2
B 3 3 3 3
C 3 3 3 3as opposed to
>>> df.groupby('colB', dropna=False).count()
colA colC colD colE
colB
A 2 2 1 2
B 3 3 3 3
C 3 3 3 3
NaN 2 2 1 2Final Thoughts
With recent pandas versions you can now incorporate missing values when performing aggregations over pandas DataFrames. Note though that if you are running an older version that does not support dropna in groupby method, then (apart from upgrading to one of the most recent versions which is always a good practice!) you could also find some workarounds such as the one outlined below:
# Fill the missing values with a particular placeholder value
# Note though that you must be careful when selecting such
# value in order to avoid any collisions with pre-existing values
# in the dataframe
# In the example below I'll use -1, but make sure to amend if needed
>>> df.fillna(-1).groupby('colB').sum()Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.
Related articles you may also like






