Functional inputs and recursion in python


This is a moronic question. Here goes:

I have two lists, candidates_input and constraint_input. The function below finds the winning candidate in candidates_input for a given ordering of constraints in constraint_input by eliminating candidates that can't be the winner until only one is left. So I'm removing items from both input lists -- losing candidates and constraints that have already told all they can tell, then moving on to the next constraint.

I don't want to actually modify the original input lists, though, because I'll need them again. But inserting something like this at the beginning of the function:

remaining_candidates = candidates_input[:]
remaining_constraints = constraint_input[:]

And then substituting those new names for the old ones within the function seems to break the recursion. What am I doing wrong?

def OT_eval(candidates_input, constraint_input): #chooses an optimal candidate given candidates and constraint ranking
    constraint = constraint_input[0]             #highest ranked constraint
    violations_list = [(len(re.findall(constraint, candidate)), candidate) for candidate in candidates_input]
    Creating list of (num violations, candidate) duples for each candidate, then sorting on num violations
    for pair in violations_list:                 #checking each candidate against  known optimal candidate
        if pair[0] > violations_list[0][0]:      #num violations > minimal violations
            candidates_input.remove(pair[1])     #removing suboptimal candidate
    if len(candidates_input) > 1 and len(constraint_input) > 0: #if further constraints needed,
        constraint_input.remove(constraint)                     #remove current constraint and recurse
        OT_eval(candidates_input, constraint_input)
    elif len(candidates_input) == 1:
        optimal_candidate = candidates_input[0]
        print "Optimal Candidate:  ", optimal_candidate
        return optimal_candidate
    elif len(candidates_input) > 1 and len(constraint_input) == 0: #more than one candidate left, but no more constraints
        raise ValueError("Optimal candidate cannot be determined: check constraints")

The reason it "works" without making a copy is that when you make the recursive call, you don't return anything from it. Each recursive call re-filters the original data, and once you've returned from everything, the original data has been completely re-filtered. If you make copies, then successively more-filtered copies are created, but the original data is unchanged, and there is no way to access the more-filtered copies from the calling scope.

This is fixed by returning the result of the recursive call. At the point where you make the initial call, you would capture the return value (and perhaps reassign it to the original variable). Of course, for things to work properly, you'd need to return the same kind of thing everywhere that you return. So you'd return a (candidates_input, constraint_input) tuple so that you have that data, and let the call site interpret the result. Your code is confused; responsibility is not properly separated. There are two tasks here: filtering the data, and determining what the filtered data means.

When you write recursive code, you want it to be in a functional style, generally. That means: don't modify things in place, but instead create the modified versions. For consistency and neatness, you should be doing this even for sub-steps.

def OT_eval(candidates_input, constraint_input):
    # It's much cleaner to handle base cases for recursion **first**.
    if not (constraint_input and candidates_input):
        # We ran out of one or the other. BTW, your original code doesn't
        # consider the case of an overly constrained situation.
        return (candidates_input, constraint_input)

    constraint = constraint_input[0]
    # At first I was going to replace the call to `.sort()` by using
    # the free function `sorted` to maintain the "functional" theme.
    # However, you don't really need to sort the list, as far as I can
    # tell; you just need to find the minimum and use it as a basis for
    # comparison.
    violations = [
        (len(re.findall(constraint, candidate)), candidate)
        for candidate in candidates_input

    # Now we create "all candidates with more than the minimum violations"
    minimal_violations = min(violations)[0]
    violators = [
        for violation_count, candidate in violations
        if violation_count > minimal_violations
    # And hence "all candidates without that many violations"
    good_candidates = [
        for candidate in input_candidates
        if candidate not in violators

    # And now we can recurse.
    return OT_eval(good_candidates, constraint_input[1:])

# Then, outside, we do the actual result processing and output:

final_candidates, final_constraints = OT_eval(candidates, constraints)

if final_constraints:
    print "No candidates survived the selection process."
elif len(final_candidates) != 1:
    print "Couldn't determine a winner."
    print "The winner is:", final_candidates[0]

Of course, now that the body of the function is cleaned up, you should be able to see trivially how to convert to iteration. Also, there's a needless complication here: since we're determining the number of violations for every candidate, it makes no sense to determine all the violators and filter them out: we could just determine all the good candidates directly via the opposite condition (left as an exercise).