How would I do this column of account in R?

My dataframe: df1 Col1 A B C B D E I'd like to add a second column, Col2, in which each value is 1 if it's corresponding value in Col1 appears in Col1 more than once, 0 otherwise. Hence, it would look like this: df2 Col1 Col2 A 0 B 1 C 0 B 1 D 0 E 0

check if two dataframes have the same set of rows

I have two dataframes df1 and df2 which I believe have the same data but the rows are not in the same order. How can I check that they have the same rows but perhaps in a different order?We can first order the two datasets based on the columns tmp1 <

Reduce the size of data in pandas

I want to reduce my pandas data frame (df), to first 2 values in Python 2.7. Currently, my data frame is like this: >>> df test_number result Count 21946 140063 NTV 23899 21947 140063 <9.0 1556 21948 140063 <6.0 962 21949 140063 <4.5 871

How to replace NULL /? with 'None' or '0' in r

DF1 is ID CompareID Distance 1 256 0 1 834 0 1 946 0 2 629 0 2 735 1 2 108 1 Expected output should be DF2 as below (Condition for generating DF2 -> In DF1, For any ID if 'Distance'==1, put the corresponding 'CompareID' into 'SimilarID' column, for '

Pandas add a dataframe without creating new columns

I have two dataframes that look like this: df1= A B 1 A1 B1 2 A2 B2 3 A3 B3 df2 = A C 4 A4 C4 5 A5 C5 I would like to append df2 to df1, like so: A B 1 A1 B1 2 A2 B2 3 A3 B3 4 A4 NaN 5 A5 NaN (Note: I've edited the dataframes so that not all the colu

Effectively associate two data tables with a condition

One data table (let's call is A) contains the ID numbers: ID 3 5 12 8 ... and another table (let's call it B) contains the lower bound and the upper bound and the name for that ID. ID_lower ID_upper Name 1 4 James 5 7 Arthur 8 11 Jacob 12 13 Sarah so

Data creation at Scala

wordsDF = sqlContext.createDataFrame([('cat',), ('elephant',), ('rat',), ('rat',), ('cat', )], ['word']) This is a way of creating dataframe from a list of tuples in python. How can I do this in scala ? I'm new to Scala and I'm facing problem in figu

Find uniqueness as part of data with NA lines?

I have a data frame like below. I would like to find unique rows (uniqueness). But in this data I have 'NA'. I like if all value in one row with NA value is the same with other rows (like rows: 1,2,5) I want to ignore it, but if not same (like rows :

Dataframe Pandas with MultiIndex: exclude level values

I have a multi-indexed pandas dataframe like the following one. import numpy as np import pandas as pd arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']), n

Delete the second character of a string in DataFrame

I have a DataFrame with a column of names that include the middle initial. I need to remove the middle initial which is the second character in the string. df = pd.DataFrame({'alpha': ['1', '2', '3'], 'beta': ['JRLeparoux', 'BJHernandez,Jr.','SXBridg

Filtering grouped data in R

I was wondering if anyone can help with grouping the data below as I'm trying to use the subset function to filter out volumes below a certain threshold but given that the data represents groups of objects, this creates the problem of removing certai

R programming: save three dimensional outputs after loop

I am new to R and I would like to save the out puts after the loop for (i in 1:5) { for (d in 1:10) { fonction1 fonction2 fonction3 } } At the end I would like to have 1 list-> contains 5 list-> contains 1*10 data frame -> contains certain number

Determine the cause of `identical ()` returning FALSE

I have two data.frames that I expect to be the same, but identical() returns false. As a background, one DF comes from the Iris data ARFF file, while the other a .rdata file, if that changes anything All the elements in x == y are TRUE, the class is

R object of data.frame and data.table have the same type?

I am still very new to R and recently came across something I am not sure what it means. data.frame and data.table have same type? Can an object have multiple types? After converting "cars" from data.frame to data.table, I obviously can't apply

How to name the dimensions in a data block?

I am stuck on what ought to be fairly obvious, but...I've got dataframe that I created by importing a CSV with no headers. I can't seem to figure out how to name my columns now. I've found lots of instructions for creating new dataframes or importing

Remove duplicate column combinations from an image in R

I want to remove duplicate combinations of sessionid, qf and qn from the following data sessionid qf qn city 1 9cf571c8faa67cad2aa9ff41f3a26e38 cat biddix fresno 2 e30f853d4e54604fd62858badb68113a caleb amos 3 2ad41134cc285bcc06892fd68a471cd7 daniel

Verification of several conditions

I have a data frame and want to know if a a certain string is present. I want to know if any of the values in df[,1] contain anything from inscompany. df = data.frame(company=c("KMart", "Shelter"), var2=c(5,7)) if( df[,1] == inscompany

R table by date

I've got transactional data from a SQL query which I turn into a data frame. The first column of the df contains UNIX timestamps (format="%Y/%d/%m %H:%M") which I would like to use to create a graphics plot using par to display 1 unique lineplot