Arrays sorted randomly: compare search performance

advertisements

I'm making a program that tests and compares stats of Multi-Key Sequential search and Interpolation binary search. I'm asking for an advice:

What is the best way to sort a random-generated array of integers, or even generate it like a sorted one (if that makes any sense) in given context?

I was looking into some sorting techniques, but, if you keep in mind that the accent is on searching (not sorting) performance, all of the advanced sorts seem rather complicated to be used in just one utility method. Considering that the array has to be larger than 106 (for testing purposes), Modified/Bubble, Selection or Insertion sorts are not an option.

Additional constraint is that all of the array members must be unique.

Now, my initial idea was to split the interval [INT_MIN,INT_MAX] into n intervals (n being the array length) and then add a random integer from, 0 to 232/n (rounded down), to every interval beginning.

The problem is this:

I presume that, as n rises closer to 232, like mine does, Interpolation search begins to give better and better results, as it's interpolation gets more accurate.

However:

If I rely solely on pseudo-random number generators (like rand();), their dispersion characteristics dictate the same tendency for a generated-then-sorted array, that is - Interpolation gets better at pinpointing the most likely location as the size gets closer to int limit. Uniformity/dispersion characteristics get lost as n rises to INT_MAX, so, due to stated limitations, Interpolation seems to always win.

Feel free do discuss, criticize and clarify this question as you see fit, but I'm rather desperate for an answer, because the test seems to be rigged in Interpolation's favor either way and I want to analyze them fairly. In short: I want to be convinced that my initial idea doesn't tilt the scales in Interpolation's favor even further, and I want to use it because it's O(n).


So you want to generate an "array" that has N unique random numbers and they must be in a sorted order? This sounds like a perfect use for a std::set. When inserting elements into a set they are sorted for us automatically and a set can only contain unique elements so it takes care of checking if the random number has already been generated.

std::set random_numbers;
std::random_device rd;
std::mt19937 mt(rd());
while (random_numbers.size() < number_of_random_numbers_needed)
{
    random_numbers.insert(mt());
}

Then you can convert the set to something else like a std::vector or std::array if you don't want to keep it as a set.