What is the data structure behind the Clojure sets?


I recently listened to Rich Hickey's interview on Software Engineering Radio. During the interview Rich mentioned that Clojure's collections are implemented as trees. I'm hoping to implement persistent data structures in another language, and would like to understand how sets and Clojure's other persistent data structures are implemented.

What would the tree look like at each point in the following scenario?

  1. Create the set {1 2 3}

  2. Create the union of {1 2 3} and {4}

  3. Create the difference of {1 2 3 4} and {1}

I'd like to understand how the three sets generated ({1 2 3}, {1 2 3 4}, and {2 3 4}) share structure, and how "deletions" are handled.

I'd also like to know the maximum number of branches that a node may have. Rich mentioned in the interview that the trees are shallow, so presumably the branching factor is greater than two.

You probably need to read the work of Phil Bagwell. His research into data structures is the base of Clojure, Haskell and Scala persistent data structures.

There is this talk by Phil at Clojure/Conj: http://www.youtube.com/watch?v=K2NYwP90bNs

There are also some papers:

You can also read Purely Functional Data Structures by Chris Okasaki. This blog post talks about the book: http://okasaki.blogspot.com.br/2008/02/ten-years-of-purely-functional-data.html