Minimal Perfect Hashing and my FlatMap Java Proposal
A few weeks ago I blogged about my 10x space saving proposal for storing maps as space efficient primitives:
In my situation I’m nearly seeing 10x overhead. This 350MB memory image of my data structure can be represented as just 35MB on disk. This would be more than fine for my application.
Which is when I had an idea – why am I using a pre-indexed data structure at all? Why am I keeping millions of copies of java.lang.String and java.lang.Integer instances?
Here’s what I’m going to do. I’m going to write an in-memory memory mapped file implementation of java.util.Map. Using the new NIO support for mmap files I’m going to write out a file with all the keys sorted and followed by its data.
However, I screwed up.
I implemented it with a binary search for key values which is O(log(N)) which is certainly acceptable. However, it just dawned on me that I should have used minimal perfect hashing. MPHFs are O(1) which means I can use my same proposal with less of a time/space tradeoff.
It’s not really a pressing issue. I mean I have CPU to burn but I like keeping my code elegant and fast.