Reasons for & ldquo; Symmetric & rdquo; functions


Trying out this new stackoverflow thing, as suggested :) This is not really haskell specific, but it's clearest in haskell.

Here's a pattern that comes up every once and a while: a function takes two arguments which it treats symmetrically. mappends frequently have this property. An example:

-- | Merge sorted lists of ranges.
merge :: (Ord n) => [(n, n)] -> [(n, n)] -> [(n, n)]
merge [] r2 = r2
merge r1 [] = r1
merge [email protected]((s1, e1) : rest1) [email protected]((s2, e2) : rest2)
    | e1 < s2 = (s1, e1) : merge rest1 r2
    | e2 < s1 = (s2, e2) : merge r1 rest2
    | s1 >= s2 && e1 <= e2 = merge rest1 r2 -- 1 within 2
    | s2 >= s1 && e2 <= e1 = merge r1 rest2 -- 2 within 1
    | e1 > e2 = merge (merged : rest1) rest2
    | otherwise = merge rest1 (merged : rest2)
    where merged = (min s1 s2, max e1 e2)

Notice that the treatment of 'r1' and 'r2' is symmetrical. There are really only 4 cases: merge with a null yields the non-null one, not overlapping yields the range unchanged, one contained in the other tosses the subsumed range, and overlapping creates a merged range and tries to merge it with the rest.

However, each case has a mirrored variant so there winds up being 8, even though the mirror 4 can be derived mechanically. Not only is there twice as much room to make mistakes, due to the symmetry the mistakes won't be caught by the typechecker. Is there a name for this pattern? A way to factor out the repetition? I suppose I can try to define it for a list and then write 'mappend a b = mconcat [a, b]', but the problem is rather harder for me to think of in the general form (for example, it hurts my head to try to think of which list to put the merged interval back on). It's so much easier to define mappend and then get mconcat out of that. Maybe there's a better way to think of the list version to avoid the head hurting?

What I think I want to do is "focus" on one case, so I can write in terms of "this" and "that". Not only is this easier to think of than two equally priviledged 'r1' and 'r2', the that->this case should be implicit from the this->that one.

The trick is that you're fusing two seperate steps. The first step is just merging lists. The second is merging intervals such that they do not overlap. Factor out the two steps and everthing simplifies.

mergeList ([email protected](s1,_):xs) ([email protected](s2,_):ys) = case compare s1 s2 of
      LT -> x : merge xs (y:ys)
      GT -> y : merge (x:xs) ys
      EQ -> x : y : merge xs ys
mergeList xs ys = xs ++ ys

mergeRuns ([email protected](s1,e1):x'@(s2,e2):xs)
    | e1 < s2   = x : mergeRuns (x':xs) -- x is less than and nonoverlapping
    | otherwise = mergeRuns ((s1, max e1 e2) : xs) -- there is overlap
mergeRuns x = x

merge xs ys = mergeRuns $ mergeList xs ys


If you add some inline pragmas, ghc should take care of generating somewhat more fused code for you. Otherwise, you could fuse them by hand to derive a bulkier but more efficient implementation. Our, you could just leave it be since it should be pretty efficient as is anyway.

Another approach would be to write a function mergeCons :: (n,n) -> [(n,n)] -> [(n,n)] (which is really just a variation of mergeRuns) and then substitute that for standard cons in the mergeList function. That would make reasoning about inlining somewhat easier. Here's some code demonstrating that solution (again untested):

mergeCons [email protected](s1,e1) (x'@(s2,e2):xs)
    | e1 < s2   = x : (x':xs) -- x is less than and nonoverlapping
    | otherwise = (s1, max e1 e2) `mergeCons` xs -- there is overlap
mergeCons x [] = [x]

merge' ([email protected](s1,_):xs) ([email protected](s2,_):ys) = case compare s1 s2 of
      LT -> x `mergeCons` merge xs (y:ys)
      GT -> y `mergeCons` merge (x:xs) ys
      EQ -> x `mergeCons` y `mergeCons` merge xs ys
merge' xs ys = xs ++ ys