How do you find the length (string) of a starting tag or a final tag?

advertisements

I'm trying to write a jQuery or pure Javascript function (preferring the more readable solution) that can count the length of a starting tag or ending tag in an HTML document.

For example,

<p>Hello.</p>

would return 3 and 4 for the starting and ending tag lengths. Adding attributes,

<span class="red">Warning!</span>

would return 18 and 7 for the starting and ending tag lengths. Finally,

<img src="foobar.png"/>

would return 23 and 0 (or -1) for the starting and ending tag lengths.

I'm looking for a canonical, guaranteed-to-work-according-to-spec solution, so I'm trying to use DOM methods rather than manual text manipulations. For example, I would like the solution to work even for weird cases like

<p>spaces infiltrating the ending tag</ p >

and

<img alt="unended singleton tags" src="foobar.png">

and such. That is, my hope is that as long as we use proper DOM methods, we should be able to find the number of characters between < and > no matter how weird things get, even

<div data-tag="<div>">HTML-like strings within attributes</div>

I have looked at the jQuery API (especially the Manipulation section, including DOM Insertion and General Attributes subsections), but I don't see anything that would help.

Currently the best idea I have, given an element node is

lengthOfEndTag = node.tagName.length + 3;

lengthOfStartTag = node.outerHTML.length
                 - node.innerHTML.length
                 - lengthOfEndTag;

but of course I don't want to make such an assumption for the end tag.

(Finally, I'm familiar with regular expressions—but trying to avoid them if at all possible.)


EDIT

@Pointy and @squint helped me understand that it's not possible to see </ p >, for example, because the HTML is discarded once the DOM is created. That's fine. The objective, adjusted, is to find the length of the start and end tags as would be rendered in outerHTML.


An alternate way to do this could be to use XMLSerializer's serializeToString on a clone copy of the node (with id set) to avoid having to parse innerHTML, then split over "><"

var tags = (function () {
    var x = new XMLSerializer(); // scope this so it doesn't need to be remade
    return function tags(elm) {
        var s, a, id, n, o = {open: null, close: null}; // spell stuff with var
        if (elm.nodeType !== 1) throw new TypeError('Expected HTMLElement');
        n = elm.cloneNode(); // clone to get rid of innerHTML
        id = elm.getAttribute('id'); // re-apply id for clone
        if (id !== null) n.setAttribute('id', id); // if it was set
        s = x.serializeToString(n); // serialise
        a = s.split('><');
        if (a.length > 1) { // has close tag
            o.close = '<' + a.pop();
            o.open = a.join('><') + '>'; // join "just in case"
        }
        else o.open = a[0]; // no close tag
        return o;
    }
}()); // self invoke to init

After running this, you can access .length of open and close properties

tags(document.body); // {open: "<body class="question-page">", close: "</body>"}

What if an attribute's value has >< in it? XMLSerializer escapes this to &gt;&lt; so it won't change the .split.
What about no close tag? close will be null.