Facebook Search

Niall pointed me out this post about Facebook’s search:

Facebook search results are sorted by an approximation of social graph distance. People closer to you in the graph—your friends and people in your networks—are likely to be more relevant to you and thus are ranked higher. We also use this concept of “social proximity” to order results within applications like groups and events. Facebook search’s key differentiator is that search results are unique to every user because they are based on a individual’s place in the social graph.

This is how the first generation of Rojo’s search worked. We supported global search but we also had feature to compute your search network based on your feeds and related weblogs and search within that subset. (not many people stumbled upon this feature btw).

So if you were subscribed to feeds A and B then we would compute a subgraph around A and B (potentially including C and D) and then search JUST within those feeds.

Sorting by graph distance didn’t work because we weren’t all in-memory. The execution time would have taken far too long.

Lucene supported a filter for search results which was actually fairly fast so we took that route instead.

It seems like Facebook’s search infra would work in this situation. I was considering a similar system as well for Rojo but it was just too early (this was 2-3 years ago).

Interesting that they rely on memory. I think more and more scalable compute infrastructures are going to cheat and ditch disk (or memory buffered disk) in favor of all in-memory data structures or SSD.

Google’s distributed compute infrastructure will still pay off but I think less so as we transition in the next few years.

On a side note. I’d LOVE to crawl Facebook but their data model requires approval before a profile is shown.


  1. “Google’s distributed compute infrastructure will still pay off but I think less so as we transition in the next few years.”

    Transition to what?

    I started to write a longer reply here but posted it to my blog instead. Check it, yo.

  2. Transition to SSDs…. sorry.

    Flash SSDs that can do 4-5k IOPS (including write) could really be interesting and give rise to a lot more applications that scale a lot easier.






%d bloggers like this: