How to find the average of a column and the mean of subtracting two columns in Pig?

I am new to scripting using Pig Latin. I am stuck to write a pig script which will find the average of a column value and also to find the average of the subtracted values between two columns.

I am reading the data from a csv file having starttime and endtime columns as below:


The code that I have tried so far is as below :

file = LOAD '/project/timestamp.csv' Using PigStorage(',') AS (st:int, et:int);
start_ts = FOREACH file GENERATE st;
grouped = group start_ts by st

The ILLUSTRATE output I am getting is as below and I am not able to apply the AVG function.

| grouped     | group:int     | file:bag{:tuple(st:int,et:int)}                 |
|             |               | {(, ), (, )}                                        |

Can anybody please help me getting the average of the starttime which would be the result of (23 + 32 + 54)/3

And also some ideas on how to code the (endtime -starttime)/no. of records (i.e 3 in this case) would be of great help for me to get started.


First ensure that you are loading the data correctly.Looks like you have double quotes i.e " around your data.Load the data as chararray,replace the double quotes and then cast it to int,finally apply the AVG function for the starttime.For the avg of endtime - starttime just subtract the 2 fields and apply AVG.

A = LOAD '/project/timestamp.csv' Using PigStorage(',') AS (st:chararray, et:chararray);
B = FOREACH A GENERATE (int)REPLACE(st,'\\"','') as st,(int)REPLACE(et,'\\"','') as et;