Optimal HashSet Optimization (Scala | Java)

advertisements

I'm writing an A.I. to solve a "Maze of Life" puzzle. Attempting to store states to a HashSet slows everything down. It's faster to run it without a set of explored states. I'm fairly confident my node (state storage) implements equals and hashCode well as tests show a HashSet doesn't add duplicate states. I may need to rework the hashCode function, but I believe what's slowing it down is the HashSet rehashing and resizing.

I've tried setting the initial capacity to a very large number, but it's still extremely slow:

 val initCapacity = java.lang.Math.pow(initialGrid.width*initialGrid.height,3).intValue()
 val frontier = new QuickQueue[Node](initCapacity)

Here is the quick queue code:

class QuickQueue[T](capacity: Int) {

val hashSet = new HashSet[T](capacity)
val queue = new Queue[T]
    //methods below

For more info, here is the hash function. I store the grid values in bytes in two arrays and access it using tuples:

override def hashCode(): Int = {
  var sum = Math.pow(grid.goalCoords._1, grid.goalCoords._2).toInt
  for (y <- 0 until grid.height) {
     for (x <- 0 until grid.width) {
        sum += Math.pow(grid((x, y)).doubleValue(), x.toDouble).toInt
     }
     sum += Math.pow(sum, y).toInt
  }
  return sum
}

Any suggestions on how to setup a HashSet that wont slow things down? Maybe another suggestion of how to remember explored states?

P.S. using java.util.HashSet, and even with initial capacity set, it takes 80 seconds vs < 7 seconds w/o the set


Okay, for a start, please replace

override def hashCode(): Int =

with

override lazy val hashCode: Int =

so you don't calculate (grid.height*grid.width) floating point powers every time you need to access the hash code. That should speed things up by an enormous amount.

Then, unless you somehow rely upon close cells having close hash codes, don't re-invent the wheel. Use scala.util.hashing.MurmurHash3.seqHash or somesuch to calculate your hash. This should speed your hash up by another factor of 20 or so. (Still keep the lazy val.)

Then you only have overhead from the required set operations. Right now, unless you have a lot of 0x0 grids, you are using up the overwhelming majority of your time waiting for math.pow to give you a result (and risking everything becoming Double.PositiveInfinity or 0.0, depending on how big the values are, which will create hash collisions which will slow things down still further).