Scala TreeSet: mutable vs immutable

Posted in: Technical Track

I was exploring immutable RedBlackTree tree implementation in Scala (2.12.4) when I noticed something that wasn’t clear to me:

  private[this] def upd[A, B, B1 >: B](tree: Tree[A, B], k: A, v: B1, overwrite: Boolean)(implicit ordering: Ordering[A]): Tree[A, B1] = if (tree eq null) {
    RedTree(k, v, null, null)
  } else {
    val cmp = ordering.compare(k, tree.key)
    if (cmp < 0) balanceLeft(isBlackTree(tree), tree.key, tree.value, upd(tree.left, k, v, overwrite), tree.right)
    else if (cmp > 0) balanceRight(isBlackTree(tree), tree.key, tree.value, tree.left, upd(tree.right, k, v, overwrite))
    else if (overwrite || k != tree.key) mkTree(isBlackTree(tree), k, v, tree.left, tree.right)
    else tree
  }

Comparing keys via compare is followed by explicit key not “equals” condition.

I compared similar parts of mutable RedBlackTree implementation and haven’t found anything similar. So I started to look for the collections that may work differently for the mutable and immutable version. So TreeSet, overwrite is passed as false compared to true TreeMap: this makes sense, for tree map, inserting the same key will replace the value, while this is not needed for TreeSet. Although key equality is still checked, it’s still not clear why.

Simple test show that mutable and immutable TreeSet work really differently in the following example.
Let’s define class extending Ordered with two fields that compare distinguishes only the first one:

case class Dummy(x: Int, y: Int) extends Ordered[Dummy] {
  override def compare(that: Dummy): Int =
    Ordering[Int].compare(this.x, that.x)
}

Now try to insert the different one by “equals” instances into mutable and immutable versions of sets:

val mutableTree = scala.collection.mutable.TreeSet.empty[Dummy]
mutableTree.add(Dummy(0, 0))
mutableTree.add(Dummy(0, 1))
mutableTree
var immutableTree = scala.collection.immutable.TreeSet.empty[Dummy]
immutableTree += Dummy(0, 0)
immutableTree += Dummy(0, 1)
immutableTree

So were the results (2.12.4):

Welcome to Scala 2.12.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.
scala> case class Dummy(x: Int, y: Int) extends Ordered[Dummy] {
     |   override def compare(that: Dummy): Int = Ordering[Int].compare(this.x, that.x)
     | }
defined class Dummy
scala> val mutableTree = scala.collection.mutable.TreeSet.empty[Dummy]
mutableTree: scala.collection.mutable.TreeSet[Dummy] = TreeSet()
scala> mutableTree.add(Dummy(0, 0))
res0: Boolean = true
scala> mutableTree.add(Dummy(0, 1))
res1: Boolean = false
scala> mutableTree
res2: scala.collection.mutable.TreeSet[Dummy] = TreeSet(Dummy(0,0))
scala> var immutableTree = scala.collection.immutable.TreeSet.empty[Dummy]
immutableTree: scala.collection.immutable.TreeSet[Dummy] = TreeSet()
scala> immutableTree += Dummy(0, 0)
scala> immutableTree += Dummy(0, 1)
scala> immutableTree
res5: scala.collection.immutable.TreeSet[Dummy] = TreeSet(Dummy(0,1))

As we can see mutable TreeSet has not got Dummy(0, 1) while the immutable TreeSet did.

More interesting, is what will happen if we try to remove an element:

scala> mutableTree.remove(Dummy(0, 1))
res6: Boolean = true
scala> mutableTree
res7: scala.collection.mutable.TreeSet[Dummy] = TreeSet()

Mutable version worked exactly based on compare.
For Immutable version:

scala> immutableTree -= Dummy(0, 0)
scala> immutableTree
res9: scala.collection.immutable.TreeSet[Dummy] = TreeSet()

Nice, immutable did the same (it has Dummy(0, 1) and removing Dummy(0, 0) works).
If we look at immutable RedBlackTree implementation for the del function, we can see that there is simple:

    val cmp = ordering.compare(k, tree.key)
    if (cmp < 0) delLeft
    else if (cmp > 0) delRight
    else append(tree.left, tree.right)

and there is no specific case for cmp == 0 && k != tree.key.

This is not a bug as this violates Ordered contract

It is important that the equals method for an instance of Ordered[A] be consistent with the compare method. However, due to limitations inherent in the type erasure semantics, there is no reasonable way to provide a default implementation of equality for instances of Ordered[A]. Therefore, if you need to be able to use equality on an instance of Ordered[A] you must provide it yourself either when inheriting or instantiating.

I was still interested in finding out why this condition was added, so I went to git and found an original commit. It was exactly what I had expected for an overwrite parameter but there was no clear explanation why key compare was added:

Here we had an issue that RedBlack does not work the same way
for sets – which are not supposed to replace an element if
it is the same (wrt equals) and maps – which should replace
the corresponding values.

Adding an overwrite parameter which decides whether to overwrite
added keys if they are the same in the ordering.

And also worth noting is that HashSet has nothing to do with ordering so they end up having both values inserted:

scala> var immutableHash = scala.collection.immutable.HashSet.empty[Dummy]
immutableHash: scala.collection.immutable.HashSet[Dummy] = Set()
scala> immutableHash += Dummy(0, 0)
scala> immutableHash += Dummy(0, 1)
scala> immutableHash
res11: scala.collection.immutable.HashSet[Dummy] = Set(Dummy(0,0), Dummy(0,1))
email

Interested in working with Valentin? Schedule a tech call.

About the Author

Valentin is a specialist in Big Data and Cloud solutions. He has extensive expertise in Cloudera Hadoop Distribution, Google Cloud Platform and skilled in building scalable performance critical distributed systems and data visualization systems.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *