XQuery: How to count the number of times a value occurs in sequence

advertisements

I know that the function count can be used to count the number of elements in a given sequence, like this:

count(result/actors/actor)

in this XML:

<result>
    <actors>
        <actor id="00000015">Anderson, Jeff</actor>
        <actor id="00000030">Bishop, Kevin</actor>
        <actor id="0000000f">Bonet, Lisa</actor>
        <actor id="916503207">Parillaud, Anne</actor>
        <actor id="916503208">Pitt, Brad</actor>
        <actor id="916503209">Freeman, Morgan</actor>
        <actor id="916503211">Domingo, Placido</actor>
        <actor id="916503210">Sharif, Omar</actor>
        <actor id="1337">Doqumenteriet2011</actor>
    </actors>
</result>

But what if I want to know how many times a value occurs in a given sequence?

For example if I would like to know how many movies each actor (actorRef) appeared in in the following XML:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000006</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>

I can easily produce a list of all the appearing actors, and even have them appear as many times in my produced sequence as in the XML:

result/videos//actorRef

but I am not able to do anything similar to what for example COUNT() and GROUP BY do together in SQL to get a list of the actors by count of their multiplicity in the sequence produced by the above line of XQuery.

How can I produce this list?

PS: The end goal is to find the actors that appeared the most movies.


This is the kind of question that isn't good for a document store when you are just storing the list of actors in videos. I'd suggest also storing the lists of videos that an actor is part of. Then you'd just have to query for the actor that has the most videos elements.

All that said, you can do it with the data you have it just isn't going to be all that fast. You first need to get a distance list of actors. Then query for each actor filter the videos that have that actor and do a count. and then order by count.

let $actors := fn:distinct-values($results/videos/video/actorRef)

for $actor in $actors
let $count := fn:count($results/videos/video[actorRef = $actor])
Order by $count
return ($actor, $count)