Showing posts with label pandas. Show all posts
Showing posts with label pandas. Show all posts

18 June 2022

pandas - how to group by multiple columns and get sums of multiple other columns

https://stackoverflow.com/questions/46431243/pandas-dataframe-groupby-how-to-get-sum-of-multiple-columns


Question:

I have a pandas dataframe which looks like this:

index col1   col2   col3   col4   col5
0     a      c      1      2      f 
1     a      c      1      2      f
2     a      d      1      2      f
3     b      d      1      2      g
4     b      e      1      2      g
5     b      e      1      2      g

I want to group by col1 and col2 and get the sum() of col3 and col4. col5 can be dropped since the data can not be aggregated.

Here is what the output should look like. I am interested in having both col3 and col4 in the resulting dataframe. It doesn't really matter if col1 and col2 are part of the index or not.

index col1   col2   col3   col4   
0     a      c      2      4          
1     a      d      1      2      
2     b      d      1      2      
3     b      e      2      4 


Solution:

df.groupby(['col1','col2']).agg(
     sum_col3 = ('col3','sum'),
     sum_col4 = ('col4','sum'),
     ).reset_index()



16 August 2021

export pandas dataframe to html and appy css styles

 export pandas dataframe to html and appy css styles

 

 Ku Tang Pan

I found the most precise, and frankly the easiest way of doing it is skipping the styling, to_html() etc. and converting the DF to a dictionary using the df.to_dict() method.

Specifically what gave me trouble, was displaying the styled pandas html in an outlook email, as it just wouldn't render properly with the css mess that pandas was producing.

iterate over the dict and generate the html there by simply wrapping keys/values in the tags that you need, adding classes etc. and concatenate this all into one string. Then paste this str into a prepared template with a predefined css.

For convenience I found it's useful to export the same df twice, using .to_dict() and to_dict('index') to first fill in the columns and then work your way down row by row. Alternatively just have a list of relevant column names.

dict_data = [df.to_dict(), df.to_dict('index')]

return_str = '<table><tr>'

for key in dict_data[0].keys():
    return_str = return_str + '<th class="header">' + key + '</th>'

return_str = return_str + '</tr>'

for key in dict_data[1].keys():
    return_str = return_str + '<tr><th class="index">' + key + '</th>'
    for subkey in dict_data[1][key]:
        return_str = return_str + '<td>' + dict_data[1][key][subkey] + '</td>'

return_str = return_str + '</tr></table>'

and then return_str goes into the template.

 

08 August 2021

How to convert a pandas dataframe into a python list, and a python list into a pandas dataframe?

 How to convert a pandas dataframe into a python list, and a python list into a pandas dataframe?


1. To convert a pandas dataframe into a python list

email_list = df['Email'].to_list()

or

email_list = df[['Name', 'Email']].to_list()


2. To convert a python list into a pandas dataframe

Examples

>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000