How to divide two lines of different data into Python pandas that contain strings and floats.

advertisements

I have 2 dataframes I need to divide the value from 2 data frame which divide contain string and float the division should avoid the string and only do the division on float.

DF1

       Col1     Val11   Val12
    0  A        1       9
    1  B        3       1
    2  C        5       4
    3  D        1       3
    4  E        7       6

DF2

       Col2    Val21    Val22
    0  A       20       19
    1  B       35       11
    2  C       46       42
    3  D       31       53
    4  E       28       55

I wrote the below line of code

df2.iloc['Percent'] = df1.iloc[4]/df2.iloc[4]

But I get the below error message.

TypeError: unsupported operand type(s) for /: 'str' and 'str'

Final DF should look like this

       Col2    Val21    Val22
    0  A       20       19
    1  B       35       11
    2  C       46       42
    3  D       31       53
    4  E       28       55
               0.25     0.10

Thanks and Advance for the support


You need get all string columns to index by set_index and then divide:

df2 = df2.set_index('Col2')
df2.loc['Percent'] = df1.set_index('Col1').iloc[4].values / df2.iloc[4]
print (df2)

         Val21      Val22
Col2
A        20.00  19.000000
B        35.00  11.000000
C        46.00  42.000000
D        31.00  53.000000
E        28.00  55.000000
Percent   0.25   0.109091

If there is multiple string columns use subsets of columns for divide and also add subset to output:

df2.loc['Percent'] = df1[['Val11','Val12']].iloc[4].values /  df2[['Val21','Val22']].iloc[4]
print (df2)
        Col2  Val21      Val22
0          A  20.00  19.000000
1          B  35.00  11.000000
2          C  46.00  42.000000
3          D  31.00  53.000000
4          E  28.00  55.000000
Percent  NaN   0.25   0.109091

More generic solution:

str_cols1 = ['Col1']
str_cols2 = ['Col2']
df2.loc['Percent'] = df1.drop(str_cols1, axis=1).iloc[4].values /
                     df2.drop(str_cols2, axis=1).iloc[4]
print (df2)
        Col2  Val21      Val22
0          A  20.00  19.000000
1          B  35.00  11.000000
2          C  46.00  42.000000
3          D  31.00  53.000000
4          E  28.00  55.000000
Percent  NaN   0.25   0.109091

And better solution with select_dtypes:

df2.loc['Percent'] = df1.select_dtypes(['number']).iloc[4].values /
                     df2.select_dtypes(['number']).iloc[4]
print (df2)
        Col2  Val21      Val22
0          A  20.00  19.000000
1          B  35.00  11.000000
2          C  46.00  42.000000
3          D  31.00  53.000000
4          E  28.00  55.000000
Percent  NaN   0.25   0.109091

EDIT by comment:

Use to_numeric for replace non numeric values to NaN:

df1_numeric = df1.apply(lambda x: pd.to_numeric(x, errors='coerce'))
df2_numeric = df2.apply(lambda x: pd.to_numeric(x, errors='coerce'))

df2.loc['Percent'] = df1_numeric.iloc[4].values / df2_numeric.iloc[4]
print (df2)
        Col2  Val21     Val22
0          A  20.00        19
1          B  35.00         a
2          C  46.00        42
3          D  31.00        53
4          E  28.00        55
Percent  NaN   0.25  0.109091