Python Pandas For Loop to create and merge pivoting database tables

advertisements

Beginner Pandas/For-Loop Question Here

How do I create a For-Loop for pivot table creation and merge for all columns?

Consider this df (the real dataset is like 50 estimate columns; I simplified it down to 4 here):

import numpy as np
import pandas as pd
raw_data = {'Year': [2009, 2009, 2010, 2010, 2010, 2010],
    'Quarter': [4, 4, 1, 1, 2, 2],
    'Sector': ['GPU', 'GPU', 'Gaming', 'Gaming', 'Gaming', 'Gaming'],
    'Ticker': ['NVID', 'NVID', 'ATVI', 'ATVI', 'ATVI', 'ATVI'],
    'Metric': ['EPS', 'REV', 'EPS', 'REV', 'EPS', 'REV'],
    'Estimate 1': [1.4, 350, 0.2, 500, 0.9, 120],
    'Estimate 2': [1.2, 375, 0.22, 505, 1.0, 120],
    'Estimate 3': [2.1, 250, 0.2, 510, 0.8, 120],
    'Estimate 4': [1.4, 360, 0, 400, 1.9, 125],}
df = pd.DataFrame(raw_data, columns = ['Year','Quarter', 'Sector','Ticker', 'Metric','Estimate 1','Estimate 2','Estimate 3', 'Estimate 4'])
print(df)

Desired Output - I'm looking for a DF like this:

    Year  Quarter  Sector  Ticker   EPS_1   REV_1   EPS_2   REV_2   EPS_3   REV_3    EPS_4   REV_4
0  2009      4       GPU   NVID      1.4     350     1.2     375     2.1     250      1.4     360
1  2010      1    Gaming   ATVI      0.2     500    0.22     505     0.2     510      0.0     510
2  2010      2    Gaming   ATVI      0.9     120     1.0     120     0.8     120      1.9     120

I can do this individually by using pd.pivot() and pd.merge, but am unsure how to structure this is a for-loop.

feature_names=('Year','Quarter','Sector','Ticker')
not_feature_names=['Metric','Estimate 1','Estimate 2','Estimate 3', 'Estimate 4']
df_pivot=df.drop(not_feature_names, axis=1)

df_pivot1 = df.pivot_table(index=feature_names,
                         columns='Metric',
                         values='Estimate 1',)
df_pivot1 = df_pivot1.reset_index().rename_axis(None, axis=1)
df_pivot1.rename(columns={'EPS': 'EPS_1', 'REV':'REV_1'}, inplace=True)
df_Full=df_pivot1.merge(df_pivot, on=(feature_names), suffixes=('_l', '_r'))
print(df_Full)

Here's where I'm at with the for-loop:

for (name, i) in zip(not_feature_names, range(1, 4)):
    df_pivot1 = df.pivot_table(index=feature_names,
                         columns='Metric',
                         values=name,)
    df_pivot1 = df_pivot1.reset_index().rename_axis(None, axis=1)
    df_pivot1.rename(columns={'EPS': ('EPS_'+i), 'REV':('REV_'+i)}, inplace=True)
    df_Full=df_pivot1.merge(df_pivot, on=(feature_names), suffixes=('_l', '_r')


A verbose method would be to melt the estimate columns and then do some string replacing and concatenation. And finally to pivot them back.

df1 = df.melt(id_vars=['Year', 'Quarter', 'Ticker','Metric'],
        value_vars=['Estimate 1', 'Estimate 2', 'Estimate 3', 'Estimate 4'])

df1['variable'] = df1.variable.str.replace('Estimate ', '')
df1['Metric'] = df1['Metric'] + '_' + df1['variable']
df1.pivot_table(index=['Year', 'Quarter', 'Ticker'], columns='Metric', values='value').reset_index()

Output

Metric  Year  Quarter Ticker  EPS_1  EPS_2  EPS_3  EPS_4  REV_1  REV_2  REV_3  \
0       2009        4   NVID    1.4   1.20    2.1    1.4  350.0  375.0  250.0
1       2010        1   ATVI    0.2   0.22    0.2    0.0  500.0  505.0  510.0
2       2010        2   ATVI    0.9   1.00    0.8    1.9  120.0  120.0  120.0   

Metric  REV_4
0       360.0
1       400.0
2       125.0