On Hashtables and Java Storage Efficiency

I did some more work tonight on hash tables and Java storage efficiency.

Turns out that an int actually does store entries in 4 bytes.

I created an int[] array with 2M ints. This does in fact use 4 bytes per entry.

Which is basically what you’d expect.

Creating an array of 2M Integer objects uses 20 bytes per object.

So 1/2 the problem with java.util.TreeMap is that it uses auto-boxing which means each int is wrapped in an Integer so a value that would normally only take 4 bytes now takes 20. Ouch.

That’s a 5x waste of space right there.

It turns out the best winner here was Trove with their TLongIntHashMap.

This guy uses only 27 bytes per entry. This is MUCH better than the original 100 bytes per entry I was seeing with java.util.HashMap.

However, this is still a far cry from the ideal 12 bytes per entry that I could accomplish via the FastMap idea I had earlier.

There’s a Int->Int map in Colt but I won’t be able to use this for the project I’m working on – though I do expect to grab some time to run some tests.


  1. Gary

    This all seems pretty pointless. If you’re requiring a performant application to store 2M objects in a hash table or an array there is something extremely wrong with your architecture.

  2. Ismael Juma

    I don’t know if it was intended, but the reference to Colt’s Int->Int map seems to imply that it provides something that Trove doesn’t have. As far as I can see it’s similar to Trove’s TIntIntHashMap though.

    Ismael

  3. Ismael,

    You might be right, but I should to test Colt to make sure my benchmarks are complete.

    Kevin

  4. Hi Kevin,
    You may want to check my blog about how to calculate the memory consumption for java objects here :
    https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/5163.

    Note that the rules are relatively complicated.

    These rules are only valid for SUN JVM, because the Java spec doesn’t define an object layout.

    Regards,
    Markus






%d bloggers like this: