18 June 2022

pandas - how to group by multiple columns and get sums of multiple other columns

https://stackoverflow.com/questions/46431243/pandas-dataframe-groupby-how-to-get-sum-of-multiple-columns


Question:

I have a pandas dataframe which looks like this:

index col1   col2   col3   col4   col5
0     a      c      1      2      f 
1     a      c      1      2      f
2     a      d      1      2      f
3     b      d      1      2      g
4     b      e      1      2      g
5     b      e      1      2      g

I want to group by col1 and col2 and get the sum() of col3 and col4. col5 can be dropped since the data can not be aggregated.

Here is what the output should look like. I am interested in having both col3 and col4 in the resulting dataframe. It doesn't really matter if col1 and col2 are part of the index or not.

index col1   col2   col3   col4   
0     a      c      2      4          
1     a      d      1      2      
2     b      d      1      2      
3     b      e      2      4 


Solution:

df.groupby(['col1','col2']).agg(
     sum_col3 = ('col3','sum'),
     sum_col4 = ('col4','sum'),
     ).reset_index()



No comments:

Post a Comment