Is zip () the most efficient way to combine tables with numpy memory?

advertisements

I use numpy and have two arrays, which are read with genfromtxt.

They have the shape <10000,> according to np.shape().

I want these two vectors to be in an array with the shape <10000,2>. For now I use:

x = zip(x1,x2)

but i am not sure if there is numpy function that does this better/more efficient. I dont think concatenate does what I think (or I'm doing it wrong).


There is numpy.column_stack:

>>> a = numpy.arange(10)
>>> b = numpy.arange(1, 11)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> numpy.column_stack((a, b))
array([[ 0,  1],
       [ 1,  2],
       [ 2,  3],
       [ 3,  4],
       [ 4,  5],
       [ 5,  6],
       [ 6,  7],
       [ 7,  8],
       [ 8,  9],
       [ 9, 10]])
>>> numpy.column_stack((a, b)).shape
(10, 2)

I don't make any guarantees that this is in any way better than zip in terms of memory usage, etc, but underneath it all, it appears to rely on numpy.concatenate (which is implemented in C), so that's at least encouraging:

>>> import inspect
>>> print inspect.getsource(numpy.column_stack)
def column_stack(tup):
    """
    Stack 1-D arrays as columns into a 2-D array.

    Take a sequence of 1-D arrays and stack them as columns
    to make a single 2-D array. 2-D arrays are stacked as-is,
    just like with `hstack`.  1-D arrays are turned into 2-D columns
    first.

    Parameters
    ----------
    tup : sequence of 1-D or 2-D arrays.
        Arrays to stack. All of them must have the same first dimension.

    Returns
    -------
    stacked : 2-D array
        The array formed by stacking the given arrays.

    See Also
    --------
    hstack, vstack, concatenate

    Notes
    -----
    This function is equivalent to ``np.vstack(tup).T``.

    Examples
    --------
    >>> a = np.array((1,2,3))
    >>> b = np.array((2,3,4))
    >>> np.column_stack((a,b))
    array([[1, 2],
           [2, 3],
           [3, 4]])

    """
    arrays = []
    for v in tup:
        arr = array(v, copy=False, subok=True)
        if arr.ndim < 2:
            arr = array(arr, copy=False, subok=True, ndmin=2).T
        arrays.append(arr)
    return _nx.concatenate(arrays, 1)