Technorati’s Numbers are Wrong

Dave posts another state of the blogosphere with some interesting stats. The one I found shocking was the claim that there are 50 million blogs.

These numbers are overly optimistic and dangerous. There are not 50 million blogs. Blogs are great and all but too much hype is a bad thing.

There might have been 50 million blogs that have ever been created but there aren’t 50 million blogs in active use.

Lets use Technorati’s own numbers to clear up the confusion.

They claim there are “about 1.6 Million postings per day” which if we do the math (assuming this number is correct) yields 0.032 posts per blog per day. Not very impressive.

I don’t know about you but I post about 2-3 posts per day on average. Lets be generous and say that a blog has to post at least once per day to be considered active.

This means that on any given day Technorati only has 1.6M active blogs which is a LOT smaller than the 50 million number that Technorati claims.

Lets assume some even more pessimistic numbers. Lets say that 50% of users post once per day and the other 50% post once per week. This still only yields 6.4M blogs. Still far short of the 50M that Technorati claims.

What’s really telling is the fact that while their number of weblogs is growing exponentially their number of posts per day is growing linearly:

200608081535 200608081536

This means that the average posts per weblog is falling not growing.

This leads us to two conclusions:

1. The number of active blogs is on a linear growth scale not an exponential scale
2. Technorati has a high number of inactive weblogs in their index.

I just took a rough look at their data and on June 2005 their mean post/weblog ratio was 0.0833. In June 2006 it was 0.04 which means its been cut by more than 50%.

I expect their mean post/weblog ratio will continue to fall. I’ve been saying this for a while now (in person and on this blog) but Technorati needs to prune their index.

I hope Technorati realizes that this post was intended as constructive criticism. I think that they’d be doing the blogosphere a big service by reporting more accurate data.

Update:

The only way to determine the number of actual blogs within Technorati’s index would be to gather a statistically relevant sample and determine posting frequencies and then apply that to their 1.6M posts per day figure.

Update 2:

Since Technorati’s active blogs are on a linear scale and their total number of blogs is exponential this means that their total number of inactive blogs is also exponential (ouch).

Update 3:

I like this comment on Data Mining:

Firstly, the count of the size of the blogosphere makes no sense. When one looks at population statistics, one doesn’t count all the dead people. Why do the same for blogs?

Awesome! Great point… Touche!


  1. I don’t know about you but I post about 2-3 posts per day on average. Lets be generous and say that a blog has to post at least once per day to be considered active.

    According to bloglines you’ve made 100 posts in the last 120 days.

    Your definition of “active blog” is far from generous.

    Just a minor point that does not invalidate your interersting conclusion that the number of active blogs is growing linearally.

  2. Hey Mike.

    Fair enough…. I should have said that “when I’m in active blog mode” I post about 2-3 times per day but I sometimes go heads down which yields 2-4 days without posts.

    I also added a 1 week interval in there for active blogging which I think is fair.

    100 days per 120 days yields .833 blogs per day where as 1 post per 7 days yields .142 posts per day which is much higher than the lowest number I gave as a generous post rate :)

    Kevin

  3. Why do you think blogs cease to exist if the posting rate drops?
    People make blogs for lots of reasons, to record an event, to share holiday recollections.
    I’ve made 659 posts to my blog in the nearly 5 years I’ve had it, whereas my son has made 20 in the 4 he’s had his, but I treasure them all.
    The thing is, Tailrank cares only about the Short Head of blog, and Technorati covers the Long Tail, even though the names may imply otherwise.
    By your criteria, all other forms of publishing are abandoned.

  4. Hey Kevin,

    “Why do you think blogs cease to exist if the posting rate drops?”

    I’m not… I’m saying that to cite that there are 50M blogs, note that this number is growing exponentially, but not note that the active number of blogs is much smaller and growing linearly is misleading.

    “The thing is, Tailrank cares only about the Short Head of blog, and Technorati covers the Long Tail, even though the names may imply otherwise. By your criteria, all other forms of publishing are abandoned.”

    I wouldn’t say we care about the short head… We care more about the active blogs (which might be short head I guess).

    I think one conclusion could be that there are 6.4M active blogs and I would love to see Tailrank index ALL of those.

    I think for what Technorati is doing this makes a lot of sense. I’d just like to see a differentiation on the number of ACTIVE blogs
    :)

    Hope this clarifies. Thanks.

    Kevin

  5. FWIW, Bloglines sees about 5 to 7 million new blog psots a day, and we have no where near 50 million blogs in our ‘index’. Track what people actually read, and there are only a couple million active blogs.

    Paul Querna
    Bloglines Engineer

  6. Paul. Bloglines is seeing 5-7M posts per day but is it counting the same post published in both Atom and RSS?

    What about search feeds from IceRocket, Technorati, Feedster, etc.

    What does the number look like if this data is removed (assuming it already isn’t – maybe it is).

    “Track what people actually read, and there are only a couple million active blogs.”

    Not everyone uses Bloglines ;)

    That said… are these active blogs or active feeds? I wonder if anyone from Rojo or Bloglines would care to comment…

  7. To measure by RSS feeds is to ignore most personal blogs.

    Still, Kevin B, I love your points here.

    Note that Technorati is not actually counting the number of blogs in existence. It is counting *the number of blogs it indexed.* So when Technorati added MySpace blogs, bam, the index doubled.

  8. What these numbers show us is also the fact that now it is really simple to start a blog but it is MUCH harder to keep it active (i.e. not everyone is a blogger…)

  9. I ssetup a new blog for someone a few days ago and noticed that Technorati rated it at about 1.8 million, which rather fits in well with “This means that on any given day Technorati only has 1.6M active blogs”

  10. “Not everyone uses Bloglines ;)”

    Not everyone uses Technorati either! I am *very* suspicious of their international/language figures because I think their coverage of different countries is very variable.

    For example: I live in Taiwan, and most people haven’t heard of Technorati here (and wretch.cc which is the biggest blog host doesn’t ping it). I think the same is true in China – only a small fraction of Chinese blogs are represented in Technorati

    On the ‘active blog’ discussion – it would be fairly easy for Technorati to research this, wouldn’t it? I’d be interested in the distribution. Incidentally, I don’t think ‘average number of posts/day’ is a sensible way of deciding if a blog is active or not – ‘time since last post’ is much better. For example: How many blogs haven’t been updated for 6 months?

  11. I too had a similar concern. I looked at LiveJoiurnal stats and it seems that only 17% are active!
    More here

  12. Back when I edited Blogcount.com this was a common question.

    First, Coverage. Technorati has never claimed to index every blog. They can’t. It’s a big and growing Internet. And relying on the pingmesh for blog and post discovery misses out big time.

    There are millions of blogs, by some counts nearly one third, created for private circulation behind passwords or firewalls. Technorati can’t see intranet blogs or blogs reserved just for friends or family.

    And while I’m convinced the T-team is working on it, international coverage is not what it should be, in part because blog discovery and page analysis remains an art. Some reports from around 2004 compared spiders and showed that even slightly different search strategies led to Technorati having only about a one-third duplication rate with other spiders; this means that the total number of visible blogs might be two to three times the number T reports.

    Second, Activity. when considering activity you have to look at many different styles of blogging behavior. Some is seasonal. Students in much of the western industrialized northern hemisphere taking off for three months in the summer. Europeans take four to six weeks in the summer. So saying a blog is dead may be premature even without new posts in the last 90 days. Look also at the many writing patterns. Some people write once a month in a newsletter or essay form.

    Then there are border cases. Are Nokia lifelogs really blogs? How about flickr rolls? Or machine generated blogs that show system status changes or earthquakes?

  13. When I setup my blog several months ago, since my first post, my Technorati ranking was in the 1.6 millions.

    When I got my first link in another blog, the ranking boosted to 800000. I guess that means that only 1.6 millions of blogs are referenced by other blogs.

    But I don’t think we can infer number of blogs or number of active blog in base to that.

  14. Try looking at incoming links as an index of blog activity. A blog with no links is dead, a blog with one link is just about alive. (Posting once a day is a pretty stiff yardstick off activity!!!).

    So, one link gives you a Technorati rank of about 1 million. Hence there are about one million ‘active’ blogs out there.

  15. I know that personally I’ve started 10+ blogs only two of which I currently maintain. It’s just so easy to do, it’s like “why not?” as soon as you have an idea for one.

    So I think it’s fair to say most of those 50 million blogs are dead, even if defining “active blog” is tricky.

  16. “I don’t know about you but I post about 2-3 posts per day on average. Lets be generous and say that a blog has to post at least once per day to be considered active.” – This is quite wrong assumption. Most blogs, including some in technorati top 100, posts once or twice a week. And most professionals blog only once a week. Barring ‘mashup’ blogs, most blogs with some original content, will have fewer posts than your one post a day math.

    Check the blog ‘Creating Passionate Users’, my favourite one, http://technorati.com/blogs/http://headrush.typepad.com/creating_passionate_users/, you get the picture.

  17. In my opinion, posting more than once a week makes you a more active blogger than most. Many bloggers work in cycles of activity and inactivity, sometimes going as many as two or three months with few if any posts before roaring back with a flurry of updates when they have time. While some pro bloggers do treat the activity as a job, for many people it is just a hobby or diversion.

    I do think Technorati’s numbers are inflated, and as a blog finder service I have personally never found them useful. I get better results from Google if I’m looking for blog writers on a particular topic. I respect what they’re trying to do, but I don’t consider them an authority on blog numbers.

    I’d also suggest that many bloggers have created more than one blog. I have created accounts with probably around a dozen different blogging services, and changed my main blogging service four times. Many of those blogs were tracked by Technorati, but I didn’t remove them once I stopped posting there. I’d say that I probably do experiment with new services more than the average person, but even some of my friends and family who are not particularly interested in signing up for new services have switched their blog provider or attempted and failed to keep a regular blog three or four times before they found one they liked. I wouldn’t be surprised if many of these attempts are indexed by Technorati too.

  18. The analysis above is interesting but it doesn’t mean the numbers are wrong.

    In a similar area, take Instant Message numbers. Skype can tell me the downloads, accounts and currently online. The other services just quote a vague 200M or such like without saying what it is. Does that mean that 200M is “wrong” or that Skype’s 6M online is somehow more “right”?

  19. Despite Technorati’s best efforts not to index splogs, splogs are included in that 50 million count.

    Dave Sifry acknowledges this with: “Surely some of these new blogs in Technorati’s index are Spam blogs or ‘splogs’.”

  20. I think Phil hit the nail on head. The problem is really about blog discovery. While it’s great to leave a blog trail that becomes your legacy, most readers today have no time to look for good blogs. While we do go to Google to search for stuff which may yield blog content, I would guess we rarely go to Technorati simply to see who’s blogging. I think bloggers themselves need to regulate the blogosphere, only then will the solution scale, for example, how would Technorati deal with a massive growing blog population that is not English speaking?

  21. Dangerous? DANGEROUS?!

  22. Michael Sevilla

    Hi Kevin,
    Good posting.

    Another factor to consider is the impact of splogs on the overall blog growth. We all know it is a huge and growing problem.

    I’d like to see the size of the blogosphere defined as actual blogs, respective of the other comments posted here as well.

  23. Great. Then get the bloggers in Taiwan and China to use Autopinger to ping for them and get to Technorati.

  24. /pd

    Does content/page with RSS enabled signify a blog ??

  25. There are definitely over 50 million blogs that have been created. There are more than that number just hosted on MySpace or Windows Live Spaces.

    Active blogs is a bit harder to calculate especially since the definition of ‘active’ varies depending on whom you ask. By your definition my blog is ‘inactive’ since I don’t blog every day.

  26. I like the way you interpret the figures. It seems reasonable, applying human factors to cold numbers.

    I think the expected post/blog ratio slide is also logical. I’ve always feared that blogs would become the next tech bubble; that people hypnotized by stories of thousand dollar profits would pump money in, causing artificial demand that inevitably leads to a crash.

    But I think the slide indicates that people have begun to concentrate on quality over quantity, thus keeping the bubble from bursting.

    Am I right? Or am I missing the point completely?

  27. I also agree with what you and most of your commenters are saying, though I fear I am too late to the party to add much.

    Concerning ‘active blogs’ I can offer some personal observations, I recently went on exchange as did a number of my classmates, many of us started blogs. I don’t think any of the blogs lasted as long as the exchange (~four months) with the exception of mine, which was started before the exchange term anyway…

    New service = new blog, I’ve noticed a lot of people who have jumped from LiveJournal to Blogger to MySpace and now if Kevin’s recent postings are correct to Vox… The point is that’s four abandonned blogs. It is conceiveable the number of blogs is greater than the number of bloggers by a factor or two or more.

    Finally I too question Technorati’s measurment of foreign language blogs. I recently read how there will be 60 million blogs in China by the end of the year. It seems everytime I turn around there is a new blogging service, supposedly Baidu is creating one and I directly asked them about that only a few months ago when I interviewed them and it wasn’t in their plans then…

    I think Technorati has to make decisions which services they track and which they track most frequently. They can’t possibly respond to every single ping the instant they receive it especially considering issues like spamming or scrapper sites.

  28. I think the point here is that both data points matter. The # of blogs initiated shows how many people get connected, the # of posts show how many people are participating in the conversation.

    I hear too many bloggers advocating frequent postings as the measure of your blogger creditials. Personally, I would rather bloggers post less frequently and more meaningful content. Too much is overload and not consumable. The busiest people typically have the most insight to share and may only have time to do that once a week.

  29. Sorry for the delay in comment approval guys. I had a computer glitch this morning…… anyway…

    Ben….. your suggestion to use recent inbound links as a measure of activity is
    an interesting one.

    Murali…. you say “This is quite wrong assumption. Most blogs, including some
    in technorati top 100, posts once or twice a week.”

    I factored this into my equation above. This still only yields 6.4M active blogs (read the full post) ;)

    Julian…

    “The other services just quote a vague 200M or such like without saying what it
    is. Does that mean that 200M is “wrong” or that Skype’s 6M online is somehow
    more “right”?”

    OK…… maybe I should have titled it “Technorati’s Numbers are Misleading”

    Dare…

    “By your definition my blog is ‘inactive’ since I don’t blog every day.”

    No. I revised this to include bloggers who only post once per week. This still
    only yields 6.4M active blogs and at best 12.8M if ALL of the active bloggers
    posted once per week. Far shy of the 50M number.

  30. The European blogosphere
    http://www.eu.socialtext.net/loicwiki/index.cgi?summary_page

    China’s New Obsession with Blogs and How Companies Can Benefit
    “The total number of blogs in China will grow over 200% from 37 million in 2005 to nearly 120 million by the end of 2006.”
    http://china.seekingalpha.com/article/13336

  31. Yep, you’re right, Technorati’s numbers are wrong, they grossly under track the number of actual blogs out there, as you’ll note from some of the other comments already here.

    As for dead vs alive blogs, we don’t track web pages that way, why should blogs be different, it’s still a blog even if its not being updated.

  32. “we don’t track web pages that way, why should blogs be different”

    Because blogs are primarily about real-time conversation not about history. The most useful number is one that reveals how many people are currently authoring their ideas.

  33. There are a couple of things I don’t buy here – firstly your characterisation of what an ‘active blogger’ might be (I would consider someone who wrote a weblog once a month to be active, and would consider the potentially large number of those users to be valuable), and secondly your estimation techniques seem a bit troubling to me.

    On the first point, it seems to me that Technorati are trying to point towards the value of the community that exists. If you consider that only a weblog that posts once a week or above could be considered valuable, then I’d have to disagree quite strongly. If there turned out to be ten million webloggers who posted once a month I wouldnt’ be in the slightest bit surprised, and I would consider them to be a valuable and potentially marketable, engaged proportion of the community. Many of hte people I know who write once a month do so in a much more considered way than those who write several times a day.

    The other thing I’m fuzzy on is the way you indicate a normal posting rate and kind of break it into these two categories. Clearly that’s a fudge, I don’t think you would pretend otherwise, but I’d be really interested to know what the curve was like on this stuff – the number of people who post 5+ posts a day, versus 4 posts a day, versus 3, 2, 1, 0.5, 0.3, 0.2 etc. Your assumption seems to be that the vast majority of people that use weblogs would post somewhere between once a week and once a day.

    If I’ve got my maths right, your model works on the principle that one person publishing seven times a week plus one person publishing once a week, would create an average posting rate for an active weblogger of four posts a week. That actually seems to me to be an extraordinarily high number for many people – particularly considering that they may only post at work or during week days. It seems to me that while there’s probably a threshold under which writing a weblog ceases to be enormously rewarding, there are probably many fewer people writing one post a day than one post every two or three days, and probably fewer of those than people who write once a week or twice a month.

    My guess would be that the curve here for posting rates would probably resemble a normal distribution skewed a bit towards low posters, or at least have a flat centre somewhere between one post every couple of weeks and one post every couple of days.

    My suspicion is that if you did the maths with curves like that you’d end up with some very different figures and conclusions. Which is not to say of course that the Technorati figures are accurate, or that there aren’t dead weblogs in the stats. Just that I think your interpretation of the results is pretty likely to be inaccurate and tendentious.

  34. DAN RATHER

    True. But then are there really 8 billion web pages? Some blogs are time based, for particular events, or collections of stories or steps for many things. Not all blogs are actively posted to but have good information or were a good journal of events.

  35. Good posting, and interesting discussions! To add some empirical insight on the frequency of blogging, here are some numbers from a large-scale survey (N=4402 bloggers) of the german-speaking blogosphere:

    How often do you usually publish postings in your blog?
    Several times a day: 11,4%
    About once a day: 21,7%
    A couple of times/week: 39,2%
    A couple of times/month: 22,6%
    Once a month or less: 5,1%

    The N=4402 are people who consider themselves “active bloggers”; there was a filter question somewhere before where people could state if they used to run a blog but quit – these ex-bloggers have not been included in the above-given statistics.

  36. “I don’t know about you but I post about 2-3 posts per day on average. Lets be generous and say that a blog has to post at least once per day to be considered active.”

    I disagree. Thomas Mahon’s blog [http://englishcut.com] is updated maybe 2-3 times a month and it’s VERY active, if you think of it as a new business finder.

    Not everybody has something new to say to the world every day. Some people prefer choosing their words more carefully.

  37. It would be nice to know how big a pond I’m swimming in. I found your arguments persuavive. If the math isn’t dead on, still I really think it is unlikely that there are 50 + million blogs that are active by any definition that makes sense. 3-5 posts per week for me.

    Easy Rog

  1. 1 Webfeed Decentralized

    One Million More Legitimate Posts

    I find it amazing that 70% of the pings that Technorati receive are from known spam sources, yet they are able to drop them before theyre indexed.
    After looking at this graph, it seems that there are about one million more legitimate&#822…

  2. 2 TechBlog

    Updated: Wow, that’s a big blogosphere you got there

    Every three months, Dave Sifrey of Technorati drags out his abacus and counts up the number of blogs in a “State of the Blogosphere” report. This quarter’s report is out, and here’s the good stuff: Technorati is now tracking over…

  3. 3 Data Mining

    Blogosphere Statistics Proposal

    A logical continuation of my comments on Sifry’s State of the Blogosphere post is to make some proposals regarding what would be acceptable observations to make about the blogosphere. Firstly, we can consider the things that we would like to

  4. 4 linkage

    http://www.abstractdynamics.org/linkage/archives/008201.html

    Kevin Burton’s Feed Blog: Technorati’s Numbers are Wrong…

  5. 5 http://crabapple.cc

    Technorati’s Numbers are Wrong

    I was thinking about this last night, too. As there are zillions of new blogs showing up, there are zillions of old blogs going dormant.

  6. 6 1000 Flowers Bloom

    Blogs Not Doubling Every 6 Months?

    I came across an excellent post by Kevin Burton about his analysis that blogs are not doubling every 6 months and there aren’t really 50 million active blogs while I was reading Ethan Stock’s blog. This runs contrary to the

  7. 7 ProPr

    State of the Blogosphere – Its the trend that is important, not the snapshot

    In the wake of David Sifrys latest State of the Blogosphere post, a number of thoughtful commentators are challenging Sifrys estimate of 50 million blogs. A principal line of argument revolves around whether the Technorati numbers creat…

  8. 8 Thatedeguy

    How many blogs are there really?

    Dave Sifrys State of the Blogosphere recently noted that there were 50 million blogs and counting.  Kevin Burton thinks that using that number is overly optimistic and dangerous.
    Frankly, I think that both Dave and K…

  9. 9 kottke.org remaindered links

    Kevin Burton looks at the Technorati “data” and discovers that since the number of daily postings is growing linearly

    http://www.feedblog.org/2006/08/technoratis_num.html

  10. 10 Dead2.0

    Report: Entire world to blog within 12 months

    Yes folks, its true.  A couple of days ago the CEO of Technorati posted on his blog that over 50 million blogs exist (Technorati new slogan: one site to rate them all, one site to find them, one site to track them all, and in the darkness bind …

  11. 11 keso

    昨日新闻 – 百度自称点击欺诈防御体系很完善

    昨日新闻 – 百度自称点击欺诈防御体系很完善

  12. 12 Andrew R H Girdwood

    Blog Wars

    There are a lot of holes in people’s maths. I just happen to think that his own maths are wrong.

  13. 13 Clicked

    Drinks on a plane

    Of the non-news coverage of today’s events, I found myself appreciating the commentary at BoingBoing….

  14. 14 Clicked

    Drinks on a plane

    Of the non-news coverage of today’s events, I found myself appreciating the commentary at BoingBoing. Their liveblogging the news of the new carry-on item list included a mention that Transformers are explicitly allowed. That’s no joke, they’re on th…

  15. 15 le blog de groupe Reflect

    Et s’il y avait un effet de longue traîne sur la croissance de la blogosphère ?

    À la dernière livraison de l’indicateur Technorati sur la croissance de la blogosphère, une limite a été atteinte et suscité une frustration légitime. Il ne suffit pas de compter le nombre de blog, encore faut-il tenir compte de leur…






%d bloggers like this: