Netflix Rankings are Broken but their Solution is a Bad One

200610021600Netflix is seriously thinking outside the box here and is offering $1M to whoever can improve their ranking engine (thread on Tailrank)

I love the fact that they’re trying to approach the problem from a different perspective but I just can’t help but think this will fail.

They’ve announced that they will give you access to 100M customer records which you can then use to tune and power your new ranking algorithms.

Let’s do the math.

100M records * 2k per record yields about 200G of data. I think 2k is a fairly conservative number here. It might be more but it might be less as well. I wouldn’t be surprised if the records were much larger than this.

If you assume they can get 4x compression out of this dataset you’re still looking at 50G which you’ll probably have to query at runtime. This probably means a memory based cluster and to do that you’ll need at least $15-20k at a minimum to get started.

What Netflix should do is allow teams to pitch them their proposals and fund their research with a winning prize of $1M. Basically they’re trying to get their research for free and the off chance that someone improves their ranking systems they get their new tech on the cheap ($1M is nothing).

  1. I noticed this too and something about it seemed fishy to me, but not just on the technical side, on the business side too.

    Speaking as someone who has managed more than one developer program, developer contests drive me crazy, they just seem so cheap, exploitative and thoughtless. Almost like shooting yourself with steroids and then not remembering to go to the gym. Why not provide some platform infrastructure to enable people to build businesses on top of all that Netflix data? That’s more interesting than even $1M.

  2. Kenan

    What about if they used Amazon EC2 service? Would it still be that expensive to form a cluster of servers?

  3. The download file is 697,552,015 bytes long according to the /download page.

  4. Andre is right on the file size, important to note that this is the tar.gz’ed size.

%d bloggers like this: