Prints a minimum set of packages required to install the package


I have to create a data structure that represents the dependencies between packages.

I thought I could simply use a graph, but the problem is that some packages can depend on one of the "optional" packages, and we have to choose which one of these optimal package is more convenient to install (basically, from these optional choices, we need to install the best).

For example, suppose I have the following situtation:

  1. package1:
  2. package2: package1
  3. package3: package1, package2
  4. package4: package1 | package3
  5. package5: package1, package2 | package3

This situation means that:

  1. package 1 has no dependencies.
  2. package 2 depends on package 1 (we need to install package 1).
  3. package 3 depends on package 1 and 2 (we need to install both)
  4. package 4 depends package 1 or 3 (we can install either 1 or 3, but we need to choose the best choice, which means the choice that make package 4 depend on less packages)
  5. package 5 depends on package 1, and it depends also either on package 2 or 3 (again, we need to chose the best choice)

Now, the problem is clearly when we can choose between different packages.

How do we choose them?

Why, for example, package 4 should depend on package 1 instead of 3?

We could try checking which packages package 1 and package 3 depends on, but what about if we have 10000 choices, but we need just 1 best choice? It would need thousands of loops and stuff too much complicated. There might be something simple, but I don't know what.

This kind of trying to choose which one is better to install seems to lead to a recursive algorithm somhow, and this already blows my mind.

You could create a directed graph with two types of nodes:

  • package nodes, whose children are combined using an AND logic
  • OR nodes, whose children are combined using an OR logic

All the arcs starting from a node represent its dependencies.

For your example this would create the following graph:

If you have to evaluate the minimum set of dependencies for a package, you can visit the graph using a depth-first-search algorithm:

  • when you visit a package node you sum the number of dependencies of its children
  • when you visit an OR node you consider the minimum number of dependencies of its children

Example code

Here is an example of how to implement this in python (I've used python as its syntax is almost like pseudo-code; you should be able to get the gist of it):

#!/usr/bin/env python3

class Package:
    # Constructor
    def __init__(self, name, dependencies): = name
        self.dependencies = set(dependencies)
    # Needed to print the package
    def __str__(self):
    # This method returns the optimal set of dependencies for this package
    def getOptimalDependencies(self):
        optimalDependencies = set()
        for dependency in self.dependencies:
            optimalDependencies = optimalDependencies.union(dependency.getOptimalDependencies())
        return optimalDependencies

class Or:
    # Constructor
    def __init__(self, dependencies):
        self.dependencies = set(dependencies)
    # This method returns the optimal set of dependencies
    # of the packages combined with this 'OR'
    def getOptimalDependencies(self):
        optimalDependencies = set()
        for dependency in self.dependencies:
            alternativeDependencies = dependency.getOptimalDependencies()
            if len(optimalDependencies) == 0 or len(alternativeDependencies) < len(optimalDependencies):
                optimalDependencies = alternativeDependencies
        return optimalDependencies

You can then create the packages of your example as:

package1 = Package("package1", [])
package2 = Package("package2", [package1])
package3 = Package("package3", [package1, package2])
package4 = Package("package4", [Or([package1, package3])])
package5 = Package("package5", [package1, Or([package2, package3])])

To get the list of optimal dependecies for a package5 you can then call package5.getOptimalDependencies().

If we print it with:

print(','.join(map(str, package5.getOptimalDependencies())))

we get:


If you have dependencies cycles you have to insert some control for that.