Algorithm - How to select a number from each column in a table so that their sum is as close as possible to a particular value

advertisements

I have an m x n matrix of real numbers. I want to choose a single value from each column such that the sum of my selected values is as close as possible to a pre-specified total.

I am not an experienced programmer (although I have an experienced friend who will help). I would like to achieve this using Matlab, Mathematica or c++ (MySQL if necessary). The code only needs to run a few times, once every few days - it does not necessarily need to be optimised. I will have 16 columns and about 12 rows.


Normally I would suggest dynamic programming, but there are a few features of this situation suggesting an alternative approach. First, the performance demands are light; this program will be run only a couple times, and it doesn't sound as though a running time on the order of hours would be a problem. Second, the matrix is fairly small. Third, the matrix contains real numbers, so it would be necessary to round and then do a somewhat sophisticated search to ensure that the optimal possibility was not missed.

Instead, I'm going to suggest the following semi-brute-force approach. 12**16 ~ 1.8e17, the total number of possible choices, is too many, but 12**9 ~ 5.2e9 is doable with brute force, and 12**7 ~ 3.6e7 fits comfortably in memory. Compute all possible choices for the first seven columns. Sort these possibilities by total. For each possible choice for the last nine columns, use an efficient search algorithm to find the best mate among the first seven. (If you have a lot of memory, you could try eight and eight.)

I would attempt a first implementation in C++, using std::sort and std::lower_bound from the <algorithm> standard header. Measure it; if it's too slow, then try an in-memory B+-tree (does Boost have one?).


I spent some more time thinking about how to implement what I wrote above in the simplest way possible. Here's an approach that will work well for a 12 by 16 matrix on a 64-bit machine with roughly 4 GB of memory.

The number of choices for the first eight columns is 12**8. Each choice is represented by a 4-byte integer between 0 and 12**8 - 1. To decode a choice index i, the row for the first column is given by i % 12. Update i /= 12;. The row for the second column now is given by i % 12, et cetera.

A vector holding all choices requires roughly 12**8 * 4 bytes, or about 1.6 GB. Two such vectors require 3.2 GB. Prepare one for the first eight columns and one for the last eight. Sort them by sum of the entries that they indicate. Use saddleback search to find the best combination. (Initialize an iterator into the first vector and a reverse iterator into the second. While neither iterator is at its end, compare the current combination against the current best and update the current best if necessary. If the current combination sums to than the target, increment the first iterator. If the sum is greater than the target, increment the second iterator.)

I would estimate that this requires less than 50 lines of C++.