& Ldquo; Logically slower & rdquo; The algorithm turns out to be faster, but why?

advertisements

I've implemented two different algorithms which do essentially the same, check for visibility from one node to another in a tree of nodes, with the rules being simple - a node is only visible to another node if it precedes it along the same branch.

The first method goes down the tree from child to parent, skipping other potential children in the parent to get the tree indices for both nodes and uses some basic logic to determine if there is visibility. I decided to go for this one first because I already had the methods for the node indices which I needed for something else and I assumed it to be potentially faster.

bool isVisibleTo(Node * accessor) {
  QList<uint> accessedI = getIndex();
  QList<uint> accessorI = accessor->getIndex();
  if (accessedI.size() > accessorI.size()) {
    return false;
  } else if (accessedI.size() == accessorI.size()) {
    for (int i = 0; i < accessedI.size() - 1; ++i) {
      if (accessedI.at(i) != accessorI.at(i)) {
        return false;
      }
    }
    if (accessedI.last() > accessorI.last()) {
      return false;
    }
  }
  for (int i = 0; i < accessorI.size() - (accessorI.size() - accessedI.size()); ++i) {
    if (accessedI.at(i) > accessorI.at(i)) {
      return false;
    }
  }
  return true;
}

The second one traverses the tree completely, every child down to the parent and so on, going through significantly more nodes and I can only assume memory pages and cache lines.

bool isVisibleTo2(Node * accessor) {
  Node * node = accessor;
  while (node) {
    if (node == this)
      return true;
    if (node->_parent) {
      uint i = node->_parent->_children.indexOf(node);
      while (i) {
        if (node->_parent->_children.at(--i) == this) {
          return true;
        }
      }
    }
    node = node->_parent;
  }
  return false;
}

I expected this to be the slower algorithm for big trees. But it turned out to be 10-20 times faster for small trees and as the tree size increased it stuck at a consistent 4x better in the last few test, the final of which took about 20 minutes and involved 10 million nodes in the tree (granted most of the time was the node allocation, the actual visibility check was several seconds).

So what are those performance figures due to? Considering that they provide identical results (checked that thoroughly - there is no work saved by the second method) and the first method involves fewer memory hops and I assume is much more cache friendly and also it can just check the depth and do a much shorter evaluation? Granted it does 2 traversals rather than one, but they are directly child to parent, skipping the rest of the children along the way. And yes, I do realize the second method does not need to go all the way down, but still...

Edit: I switched to -O3 compilation, but the figures did not change. I also tried to change the list of getIndex to a vector but it actually caused a substantial performance drop, since the indices need to be inserted in reverse order, e.g. prepended, which is very inefficient for a vector.

Edit 2: Did a quick test with a vector once again, this time I scrapped prepending and went of a regular insert and a reverse operation before returning, this made the vector solution slightly faster, from 8 times slower than the full traversal method to "only" 6 times slower. I suspected that the QList allocations might be the primary culprit for the low performance, but as it seems, there is something more to it.


If I understand you correctly the getIndex() function, which you call in the first case does basically the same walk over all tree, which isVisibleTo2() does. But isVisibleTo1() has additional to getIndex operations, therefore it is slower.