Try to transform the HTML table into XML with attributes - choose headers and data


I have a simple HTML input file; which contains a table. The column headers of the HTML table are defined in row 2; and the data follows for row 2+.

So I'm picking up the data like this:

<xsl:template match="HTML">
    <xsl:apply-templates select="//TABLE/TR[position() > 2]"/>

<xsl:template match="TR">
    <xsl:apply-templates select="TD"/>

   <xsl:template match="TD">
    <xsl:variable name="pos"><xsl:value-of select="position()"/></xsl:variable>
    <xsl:value-of select="normalize-space(.)"/>
        <xsl:value-of select="/HTML//TABLE/TR[2]/TD[$pos]"/>

(This last template is a debug version; the final output I'm after is to use the header information to generate dynamic attribute names)

What I'm struggling to get, is the $pos variable to index the TR[2] on the document: it always seems to equate to '1'; originally I just trying using 'position()' and the index, but this doesn't work for me.

I know (if I do an 'xsl:value-of') that $pos is correctly changing, but within the predicate it seems to collapse into a 1 ....

What do I need to do here....

The problem is here:


In XPath it must be known that $x is a number and only then someElement[$x] is treated as a shortcut to someElement[position() = $x]

In XSLT 1.0 / XPath 1.0 there is only rudimentary, weak typing and the type of a variable cannot be specified and isn't generally known.

This is why, this XPath expression:


is interpreted as:


and selects all TD elements that are children of the TR element that is the second TR child of any TABLE element that is a descendant of the top element of the XML document.


In XPath 1.0 use either the full unabbreviated expression:

/HTML//TABLE/TR[2]/TD[position() = $pos]

or use the shorter:


In XPath 2.0 (XSLT 2.0), explicitly specify the type of the variable:

<xsl:variable name="pos" as="xs:integer" select="position()"/>

and then it can be used and correctly known to be an xs:integer in: