I would like to create a numpy array from an iterable, which yields tuples of values, such as a database query.
data = db.execute('SELECT col1, col2, col3, col4 FROM data') A = np.array(list(data))
Is there a way faster way of doing so, without converting the iterable to a list first?
I am not an experienced user of
numpy, but here is a possible solution for the general question:
>>> i = iter([(1, 11), (2, 22)]) >>> i <listiterator at 0x5b2de30> # a sample iterable of tuples >>> rec_array = np.fromiter(i, dtype='i4,i4') # mind the dtype >>> rec_array # rec_array is a record array array([(1, 11), (2, 22)], dtype=[('f0', '<i4'), ('f1', '<i4')]) >>> rec_array['f0'], rec_array # each field has a default name (array([1, 2]), (1, 11)) >>> a = rec_array.view(np.int32).reshape(-1,2) # let's create a view >>> a array([[ 1, 11], [ 2, 22]]) >>> rec_array = 23 >>> a # a is a view, not a copy! array([[ 1, 23], [ 2, 22]])
I assume that all columns are of the same type, otherwise rec_array is already what you want.
Concerning your particular case, I do not completely understand what is
db in your example. If it is a cursor object, then you can just call its
fetchall method and get a list of tuples. In most cases, the database library does not want to keep a partially read query result, waiting for your code processing each line, that is by the moment when the
execute method returns, all data is already stored in a list, and there is hardly a problem of using
fetchall instead of iterating