Reading a file with string and float with loadtxt


I need to read the data set available at this page with python.

They are very precise how to define the data type of each column. How can I use loadtxt (it's a numpy function) to read this dataset. I tried giving the data type in the dtype option but it didn't work.

Tables in the site you link are very different from each other and you have different types in different columns.

You need to define a record type for each table.
A record type allows you to declare strings, integers, floats on the same array. It is defined and used like in this example:

>>> recordtype = dtype([('name', str_, 20), ('age', int32), ('weight', float32)])
>>> people = array([('Joaquin', 51, 60.0), ('Cat', 18, 8.6)], dtype=recordtype)
>>> people
array([('Joaquin', 51, 60.0), ('Cat', 18, 8.600000381469727)], dtype=[('name', '<U20'), ('age', '<i4'), ('weight', '<f4')])

On the other hand you have rows with contents such as '...' that break the coherence of the data on it. So if you need to read directly from the file, you would need to use a converter function for loadtxt converters parameter.

Alternatively, as loadtext accepts also a generator as input, you could process lines in the generator and feed loadtext with cleaned lines.

Finally you should also set the skiprows parameter to eliminate table headings