returns n smaller indexes per column using pandas

I have the following (simplified) dataframe: df = pd.DataFrame({'X': [1, 2, 3, 4, 5,6,7,8,9,10], 'Y': [10,20,30,40,50,-10,-20,-30,-40,-50], 'Z': [20,18,16,14,12,10,8,6,4,2]},index=list('ABCDEFGHIJ')) Which gives the following: X Y Z A 1 10 20 B 2 20

Combine multiple data side by side and rename them

I have multiple data frames in my analysis. For example dataframe 1 where this is the number of people by activity in China Activity No of people Activity 1 100 Activity 2 200 Activity 3 300 and data frame 2 where this is the number of people by acti

Get average of rows in data greater than or equal to zero

I would like to get the average value of a row in a dataframe where I only use values greater than or equal to zero. For example: if my dataframe looked like: df = pd.DataFrame([[3,4,5], [4,5,6],[4,-10,6]]) 3 4 5 4 5 6 4 -10 6 currently if I get the

Pandas how to make a conditional selection on MultiIndex

Here is the sample data file, and I performed the following operation in ipython notebook: !curl -O http://pbpython.com/extras/sales-funnel.xlsx df = pd.read_excel('./sales-funnel.xlsx') df['Status'] = df['Status'].astype('category') df["Status"

Python Pandas partial match of the chain

I created a dataframe df where I have a column with the following values: category 20150115_Holiday_HK_Misc 20150115_Holiday_SG_Misc 20140116_DE_ProductFocus 20140116_UK_ProductFocus I want to create 3 new columns category | A | B | C 20150115_Holida

How to randomly select some lines of pandas data?

I have a pandas dataframe df which contains a column amount. For many rows, the amount is zero. I want to randomly remove 50% of the rows where the amount is zero, keeping all rows where amount is nonzero. How can I do this?pandas Using query + sampl

Pandas extract rows according to complex conditions

I have this dataframe: source target weight 24517 class social 31 24356 class proletariat 29 16189 bourgeoisi class 29 24519 class societi 29 24710 class work 28 15375 bourgeoisi class 26 23724 class condit 24 24314 class polit 24 ... How can I creat

Reduce the size of data in pandas

I want to reduce my pandas data frame (df), to first 2 values in Python 2.7. Currently, my data frame is like this: >>> df test_number result Count 21946 140063 NTV 23899 21947 140063 <9.0 1556 21948 140063 <6.0 962 21949 140063 <4.5 871

pandas complicated joining operation

I would like to implement a specific join operation with the following requirements: I have a data frame in the following format, where the index is datetime and I have columns from 0 to N (9 in this example) df1: 0 1 2 3 4 5 6 7 8 9 2001-01-01 2 53

Pandas add a dataframe without creating new columns

I have two dataframes that look like this: df1= A B 1 A1 B1 2 A2 B2 3 A3 B3 df2 = A C 4 A4 C4 5 A5 C5 I would like to append df2 to df1, like so: A B 1 A1 B1 2 A2 B2 3 A3 B3 4 A4 NaN 5 A5 NaN (Note: I've edited the dataframes so that not all the colu

Pandas plotting in the Windows terminal

I have a simple pandas data frame. Trying to plot from the Windows 10 terminal session of IPython gives me this: In [4]: df = pd.DataFrame({'Y':[1, 3, 5, 7, 9], 'X':[0, 2, 4, 6, 8]}) In [5]: df Out[5]: X Y 0 0 1 1 2 3 2 4 5 3 6 7 4 8 9 In [6]: df.plo

Reading a csv file with a timestamp column, with pandas

When doing: import pandas x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime', names=['DateTime', 'X'], header=None, sep=';') with this data.csv file: 1449054136.83;15.31 1449054137.43;16.19 1449054138.04;19.22 1449054138.65;15.12

Delete the second character of a string in DataFrame

I have a DataFrame with a column of names that include the middle initial. I need to remove the middle initial which is the second character in the string. df = pd.DataFrame({'alpha': ['1', '2', '3'], 'beta': ['JRLeparoux', 'BJHernandez,Jr.','SXBridg

Grouping by 30 minutes space in Pandas

I have a sequence of one month of data and I wanted the average value at 00:00, 00:30, 01:00, ...23:30 for the whole month. If this were in an hourly basis I could simply do df.groupby(df.index.hour).mean() but I have no idea how to do this for a 30

Calculate the variance of the date according to the common ID

I have a large table that looks like the following: +---+---------+----------+-------+---------+------------+ | | cust_id | order_id | quant | revenue | date | +---+---------+----------+-------+---------+------------+ | 0 | 103502 | 107801 | 1 | 246.

Histogram of pandas of filtered data

This has been driving me mad for the one last hour. I can draw a histogram when I use: hist(df.GVW, bins=50, range=(0,200)) I use the following when I need to filter the dataframe for a given condition in one of the columns, for example: df[df.TYPE==

Pandas resample

I have some irregularly stamped time series data of the following form, Time Pressure Humidity Temperature 2014-02-13 09:15:00.355000 124.283173 26.926562 6119.075 2014-02-13 09:15:00.356000 118.537935 22.228906 6111.625 2014-02-13 09:15:00.357000 11

How to speed up conversion of CSV dates to Pandas timestamps

I have some data in CSV files with dates and times. I would like to convert these to Pandas Timestamps quickly, but the code below is taking too long. Is there any way to speed it up? The bottleneck step is the last one. Thanks! TY1 = pd.read_csv('Da

Delete a group after pandas groupby

Is it possible to delete a group (by group name) from a groupby object in pandas? That is, after performing a groupby, delete a resulting group based on its name.Filtering a DataFrame groupwise has been discussed. And a future release of pandas may i