call to recursive function and python queuing

advertisements

I have the sketch of my code as follows:

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        func2(a,x)
        func2(b,x)
    return x

x = []
y = func2(c, x)

The problem, as you might have figured out from the code, is that I would like func2(b) to be computed in parallel with func2(a) whenever condition is true i.e. before b is replace by a new b from func2(a). But according to my algorithm, this clearly can not happen due to the new b's.

I do think such a problem might be perfect for parallel computing approach. But, I did not use it before and my knowledge about that is quite limited. I did try the suggestion from How to do parallel programming in Python, though. But I got the same result like the sketch above.


Caveat: Threading might not be parallel enough for you (see https://docs.python.org/2/library/threading.html note on the Global Interpreter Lock) so you might have to use the multiprocessing library instead (https://docs.python.org/2/library/multiprocessing.html).

...So I've cheated/been-lazy & used a thread/process neutral term "job". You'll need to pick either threading or multiprocessing for everywhere that I use "job".

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        a_job = None
        if (number_active_jobs() >= NUM_CPUS):
            # do a and b sequentially
            func2(a, x)
        else:
            a_job = fork_job(func2, a, x)
        func2(b,x)
        if a_job is not None:
            join(a_job)

x = []
func2(c, x)
# all results are now in x (don't need y)

...that will be best if you need a,b pairs to finish together for some reason. If you're willing to let the scheduler go nuts, you could "job" them all & then join at the end:

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        if (number_active_jobs() >= NUM_CPUS):
            # do a and b sequentially
            func2(a, x)
        else:
            all_jobs.append(fork_job(func2, a, x))
        # TODO: the same job-or-sequential for func2(b,x)

all_jobs = []
x = []
func2(c, x)
for j in all_jobs:
    join(j)
# all results are now in x (don't need y)

The NUM_CPUS check could be done with threading.activeCount() instead of a full blown worker threa pool (python - how to get the numebr of active threads started by specific class?).

But with multiprocessing you'd have more work to do with JoinableQueue and a fixed size Pool of workers