Java TreeMap and HashMap Incompatibilities

I will often use a TreeMap in place of a HashMap because while TreeMap is theoretically slower (O(logN) vs O(1)) it is often much more memory efficient, especially for larger maps.

The problem is that you can’t just swap your map for TreeMap without modifying code because you can’t store null keys in TreeMap.

This code:

map = new HashMap();
map.put( null, "bar" );
System.out.printf( "%s\n", map.get( null ) );

works fine and prints ‘bar’.

This code:

map = new TreeMap();
map.put( null, "bar" );
System.out.printf( "%s\n", map.get( null ) );

… will throw a null pointer exception.

This is true because the TreeMap Comparator doesn’t support comparing null values. Why not?

I can pass in my own Comparator but now I need to remember this for every instance of TreeMap.

Come to think of it. I’ve often seen Map implementations slowing down somewhat decent code. The 2N rehash functionality of HashMap often means that it can blow your memory footprint out of the water and crash your JVM.

If one were to use Guice to inject the dependency on the right Map implementation 3rd party libraries would be able to swap their Map implementations out at runtime.

Further, if memory is your biggest problem, I think you might want to punt on using the Java internal Map implementations and use FlatMap (even though it has some limitations).

  1. The maps of the ‘fastutil’ project are also worth a look, especially if your keys or values are uniformly some primitive type. (They offer adjustable load *and* growth factors, to control overhead and expansion.)

  2. While at the google scalability conf I was reminded of ConcurrentHashMap……

    This with support for trove-style open addressing would be sweet!

    After investigation again it turns out that Trove with a low load factor has pretty good storage efficiency.

    I’m sure this is traded for CPU a bit because you’re going to get frequent key lookup collisions on false positives but no more than my flatmap proposal or using a binary tree.

    So I might just use trove for read only hashtables…. if I need read/write with high contention I might look at porting the concurrent stuff into trove.


%d bloggers like this: