How to compare two Collections for & ldquo; Equivalence & rdquo; Based on fields of different Java classes?

Given any two classes, e.g. ClassA and ClassB below:

class ClassA {
    private int intA;
    private String strA;
    private boolean boolA;
    // Constructor
    public ClassA (int intA, String strA, boolean boolA) {
        this.intA = intA; this.strA = strA; this.boolA = boolA;
    } // Getters and setters etc. below...
}

class ClassB {
    private int intB;
    private String strB;
    private boolean boolB;
    // Constructor
    public ClassB (int intB, String strB, boolean boolB) {
        this.intB = intB; this.strB = strB; this.boolB = boolB;
    } // Getters and setters etc. below...
}

And any two different Collection types, one with ClassA elements and the other with ClassB elements, e.g:

List<Object> myList = Arrays.asList(new ClassA(1, "A", true),
                                    new ClassA(2, "B", true));
Set<Object> mySet = new HashSet<Object>(
                      Arrays.asList(new ClassB(1, "A", false),
                                    new ClassB(2, "B", false)));

What's the simplest way of telling whether the two Collections are "equivalent"(*) in terms of a specified subset of fields?

(*) The word "equivalent" is used rather then "equal" since this is contextual - i.e. such "equivalence" may be defined differently in another context.

Worked example from above: Suppose we specify that intA and strA should match with intB and strB respectively (but the boolA / boolB values can be ignored). This would make the two collection objects defined above be considered equivalent - but if an element were added to or removed from one of the collections then they no longer would be.

Preferred solution: The method used should be generic for any Collection type. Ideally Java 7 as am confined to using this (but Java 8 may be of additional interest to others). Happy to use Guava or Apache Commons but would prefer not to use more obscure external libraries.


Here's a Java 8 version using lambdas and higher-order functions. It's probably possible to convert this to Java 7 using anonymous inner classes instead of lambdas. (I believe most IDEs have a refactoring operation that does this.) I'll leave that as an exercise for interested readers.

There are actually two distinct problems here:

  1. Given two objects of different types, evaluate them by examining respective fields of each. This differs from "equals" and "compare" operations which are already defined by the JDK library APIs, so I'll use the term "equivalent" instead.

  2. Given two collections containing elements of those types, determine if they are "equals" for some definition of that term. This is actually quite subtle; see the discussion below.

1. Equivalence

Given two objects of types T and U we want to determine whether they're equivalent. The result is a boolean. This can be represented by a function of type BiPredicate<T,U>. But we can't necessarily examine the objects directly; instead, we need to extract respective fields from each object and evaluate the results of extraction against each other. If the field extracted from T is of type TR and the field extracted from U is of type UR, then the extractors are represented by the function types

Function<T, TR>
Function<U, UR>

Now we have extracted results of type TR and UR. We could just call equals() on them, but that's unnecessarily restrictive. Instead, we can provide another equivalence function that will be called to evaluate these two results against each other. That's a BiPredicate<TR,UR>.

Given all this, we can write a higher-order function that takes all of these functions and produces and equivalence function for us (wildcards included for completeness):

static <T,U,TR,UR> BiPredicate<T,U> equiv(Function<? super T, TR> tf,
                                          Function<? super U, UR> uf,
                                          BiPredicate<? super TR, ? super UR> pred) {
    return (t, u) -> pred.test(tf.apply(t), uf.apply(u));
}

It's probably a common case for the results of field extraction to be evaluated using equals(), so we can provide an overload for that:

static <T,U> BiPredicate<T,U> equiv(Function<? super T, ?> tf,
                                    Function<? super U, ?> uf) {
    return (t, u) -> equiv(tf, uf, Object::equals).test(t, u);
}

I could have provided another type variable R as the result type of both functions, to ensure they're the same type, but it turns out this isn't necessary. Since equals() is defined on Object and it takes an Object argument, we don't actually care what the function return types are, hence the wildcards.

Here's how to use this to evaluate the OP's example classes using just the string fields:

ClassA a = ... ;
ClassB b = ... ;
if (equiv(ClassA::getStrA, ClassB::getStrB).test(a, b)) {
    // they're equivalent
}

As a variation, we might also want a primitive specialization in order to avoid unnecessary boxing:

static <T,U> BiPredicate<T,U> equivInt(ToIntFunction<? super T> tf,
                                       ToIntFunction<? super U> uf) {
    return (t, u) -> tf.applyAsInt(t) == uf.applyAsInt(u);
}

This lets us construct equivalence functions based on a single field. What if we want to evaluate equivalence based on multiple fields? We can combine an arbitrary number of BiPredicates by chaining the and() method. Here's how to create a function that evaluates equivalence using the int and String fields of the classes from the OP's example. For this, it's probably best to store the function in a variable separately from using it, though this can probably all be inlined (which I think will make it unreadable):

BiPredicate<ClassA, ClassB> abEquiv =
    equivInt(ClassA::getIntA, ClassB::getIntB)
        .and(equiv(ClassA::getStrA, ClassB::getStrB));

if (abEquiv.test(a, b)) {
    // they're equivalent
}

As a final example, it's quite powerful to be able to provide an equivalence function for the field extraction results when creating an equivalence function for two classes. For example, suppose we want to extract two String fields and consider them equivalent if the extracted strings are equals, ignoring case. The following code results in true:

equiv(ClassA::getStrA, ClassB::getStrB, String::equalsIgnoreCase)
    .test(new ClassA(2, "foo", true),
          new ClassB(3, "FOO", false))

2. Collection “Equality”

The second part is to evaluate whether two collections are "equals" in some sense. The problem is that in the Collections Framework, the notion of equality for is defined such that a List can only be equal to another List, and a Set can only be equal to another Set. It follows that a Collection of some other type can never be equal to either a List or a Set. See the specification of Collection.equals() for some discussion of this point.

This is clearly at odds with what the OP wants. As suggested by the OP, we don't really want "equality," but we want some other property for which we need to provide a definition. Based on the OP's examples, and some suggestions in other answers by Przemek Gumula and janos, it seems like we want the elements in the two collections to somehow be in one-for-one correspondence. I'll call this a bijection which might not be mathematically precise, but it seems close enough. Furthermore, the correspondence between each pair of elements should be equivalence as defined above.

Computing this is a bit subtle, since we have our own equivalence relation. We can't use many of the built-in Collections operations, since they all use equals(). My first attempt was this:

// INCORRECT
static <T,U> boolean isBijection(Collection<T> c1,
                                 Collection<U> c2,
                                 BiPredicate<? super T, ? super U> pred) {
    return c1.size() == c2.size() &&
           c1.stream().allMatch(t -> c2.stream()
                                       .anyMatch(u -> pred.test(t, u)));
}

(This is essentially the same as given by Przemek Gumula.) This has problems, which boil down to the possibility of more than one element in the one collection corresponding to a single element in the other collection, leaving elements unmatched. This gives strange results if given two multisets, using equality as the equivalence function:

{a x 2, b}    // essentially {a, a, b}
{a, b x 2}    // essentially {a, b, b}

This function considers these two multisets to be a bijection, which clearly isn't the case. Another problem occurs if the equivalence function allows many-to-one matching:

Set<String> set1 = new HashSet<>(Arrays.asList("foo", "FOO", "bar"));
Set<String> set2 = new HashSet<>(Arrays.asList("fOo", "bar", "quux"));

isBijection(set1, set2, equiv(s -> s, s -> s, String::equalsIgnoreCase))

The result is true, but if the sets are given in the opposite order, the result is false. That's clearly wrong.

An alternative algorithm is to create a temporary structure and remove elements as they're matched. The structure has to account for duplicates, so we need to decrement the count and only remove the element when the count reaches zero. Fortunately, various Java 8 features make this pretty simple. This is quite similar to the algorithm used in the answer from janos, though I've extracted the equivalence function into a method parameter. Alas, since my equivalence function can have nested equivalence functions, it means I can't probe the map (which is defined by equality). Instead, I have to search the map's keys, which means the algorithm is O(N^2). Oh well.

The code, however, is pretty simple. First, the frequency map is generated from the second collection using groupingBy. Then, the elements of the first collection are iterated, and the frequency map's keys are searched for an equivalent. If one is found, its occurrence count is decremented. Note the return value of null from the remapping function passed to Map.compute(). This has the side effect of removing the entry, not setting the mapping to null. It's a bit of an API hack, but it's quite effective.

For every element in the first collection, an equivalent element in the second collection must be found, otherwise it bails out. After all elements of the first collection have been processed, all elements from the frequency map should also have been processed, so it's simply tested for being empty.

Here's the code:

static <T,U> boolean isBijection(Collection<T> c1,
                                 Collection<U> c2,
                                 BiPredicate<? super T, ? super U> pred) {
    Map<U, Long> freq = c2.stream()
                          .collect(Collectors.groupingBy(u -> u, Collectors.counting()));
    for (T t : c1) {
        Optional<U> ou = freq.keySet()
                             .stream()
                             .filter(u -> pred.test(t, u))
                             .findAny();
        if (ou.isPresent()) {
            freq.compute(ou.get(), (u, c) -> c == 1L ? null : c - 1L);
        } else {
            return false;
        }
    }

    return freq.isEmpty();
}

It's not entirely clear whether this definition is the correct one. But it seems intuitively to be what people want. It's fragile, though. If the equivalence function isn't symmetric, isBijection will fail. There are also some degrees of freedom aren't accounted for. For example, suppose the collections are

{a, b}
{x, y}

And a is equivalent to both x and y, but b is only equivalent to x. If a is matched to x, the result of isBijection is false. But if a were matched to y, the result would be true.

Putting it Together

Here's the OP's example, coded up using the equiv(), equivInt(), and isBijection functions:

List<ClassA> myList = Arrays.asList(new ClassA(1, "A", true),
                                    new ClassA(2, "B", true));

Set<ClassB> mySet = new HashSet<>(Arrays.asList(new ClassB(1, "A", false),
                                                new ClassB(2, "B", false)));

BiPredicate<ClassA, ClassB> abEquiv =
    equivInt(ClassA::getIntA, ClassB::getIntB)
        .and(equiv(ClassA::getStrA, ClassB::getStrB));

isBijection(myList, mySet, abEquiv)

The result of this is true.