Looks like Meltwater is fighting the good fight against the Associated Press.

According to Techcrunch:

The Associated Press today filed suit against Meltwater News for copyright infringement and misappropriation of its breaking news content. The complaint, filed in a New York federal court, alleges that Meltwater U.S. Holding Inc. and its Meltwater News Service, a news clipping service, have been illegally selling unlicensed AP content that competes directly with AP and its customers.

I’m obviously amazingly sympathetic since I started a company that deals with content syndication issues.

However, it looks like Meltwater is fighting back:

A series of rulings in UK courts of late is showing that UK copyright law is sadly out of sync with today’s society and renders the innocent acts of millions of UK citizens illegal.
According to UK courts, tweeting a headline or emailing a colleague a link to an online news article is a breach of copyright if it is done without a copyright licence. The simple act of browsing the Internet is deemed a potential copyright infringement unless licensed. More details can be found in my commentary on recent rulings in the UK High Court and the UK Court of appeal.

The bad news is that business users still need to sign a license and pay royalties for receiving links and headlines to news that they themselves can find freely available online. This applies to clients of all news monitoring services in the UK including Google. During the court hearings, the NLA stated that they are mandated next to license UK business users of Google News and that they intend to do so.

This is a very bad precedent and I think borders on unequal protection.

Why are news aggregators and social media monitoring companies different than web browsers collecting links manually?

At what point do you cross the line? If you write a little python script to index the web in the UK do you need a license from the AP?

We have always insisted on fair use. Our customers aren’t permitted to publish full content as part of our AUP and must link back to the source (just like Google).

I’m pleased to announce Peregrine 0.5.0 – a new map reduce framework optimized for iterative and pipelined map reduce jobs.

This originally started off with some internal work at Spinn3r to build a fast and efficient Pagerank implementation. We realized that what we wanted was a MR runtime optimized for this type of work which differs radically from the traditional Hadoop design.

Peregrine implements a partitioned distributed filesystem where key/value pairs are routed to defined partitions. This enables work to be joined against previous iterations or different units of work by the same key on the same local system.

Peregrine is optimized for ETL jobs where the primary data storage system is an external database such as Cassandra, Hbase, MySQL, etc. Jobs are then run as a Extract, Transform and Load stages with intermediate data being stored in the Peregrine FS.

We enable features such as Map/Reduce/Merge as well as some additional functionality like ExtractMap and ReduceLoad (in ETL parlance).

A key innovation here is a partitioning layout algorithm that can support fast many to many recovery similar to HDFS but still support partitioned operation with deterministic key placement.

We’ve also tried to optimize for single instance performance and use modern IO primitives as much as possible. This includes NOT shying away from operating specific features such as mlock, fadvise, fallocate, etc.

There is still a bit more work I want to do before I am ready to benchmark it against Hadoop. Instead of implementing a synthetic benchmark we wanted to get a production ready version first which would allow people to port existing
applications and see what the before / after performance numbers looked like in the real world.

For more information please see:

http://peregrine_mapreduce.bitbucket.org/

As well as our design documentation:

http://peregrine_mapreduce.bitbucket.org/design/

This weekend I climbed Mount Shasta which has been a dream of mine for a while now.

I wanted to wait until I hit my fitness goals so I was in peak physical condition.

The plan was to drive up to Shasta City (5 hours), and then crash in the parking lot until the morning.

Then you take a 5 hour hike up to Helen Lake.

This killed me because I really needed snow shoes which I didn’t bring. Next time. BRING SNOW SHOES.

Interestingly enough they’re hit and miss. On they way up they were valuable because of perfect snow conditions but on the way down they were actually worse because the snow was like soup.

The biggest problem I had (in hindsight) was that food consumption is almost impossible on the mountain because doing ANYTHING other thank hiking is insanely dangerous. One mistake and you’re dead.

During the second day I didn’t eat enough food and it almost prevented me from hitting the summit. I had to literally collapse 2-3 times and just do nothing for 20-30 minutes while I ate and drank water.

My blood sugar was so low that I basically couldn’t even lift my feet. Combined with the altitude and lack of oxygen. This means that every three steps required 30 seconds of rest.

In the future I need to upgrade my equipment with the following:

– better boots with better cushioning on the heels

– water bladder for inside my jacket and dextrose/sugar/electrolyte solution. Inside the jacket is important because it will keep it warm and prevent it from freezing.

– snow shoes

Gaiters to prevent water from getting in my shoes

kevin_on_summit-1.jpg

5829347905_443428a4ac_o.jpg

What’s up with this? This makes me crazy :-P

Google releases the Android Open Accessory API but fails to ship ANT+ support?

ONLY USB and no Bluetooth for now.

This is amazingly LAME.

Sony can ship ANT+ for Android but Google can’t?

The demo they gave on the screen, with an Android game monitoring the pace of the bike could have actually been done directly with existing ANT+ open standard wireless.

In fact, my new Trek Madone 5.5 ACTUALLY HAS AN INTEGRATED ANT+ ALREADY.

I could download the existing game and throw my bike on a trainer and actually play the game via existing hardware using an open wireless standard with existing technology.

So now my bike needs to have a USB port? Lame.

For Google to adopt a new API but ignore existing open wireless standards is amazingly lame.

I did a bunch of research tonight on using oximiters and heart rate monitors for helping to diagnose sleep apnea.

I think I might have a mild sleep apnea. I was diagnosed before but the sleep lab was so pathetic that I just completely wrote off the results and never really went back.

It turns out that from what I can gather , polysomnographs are insanely expensive (like $2k per night). These are the sleep studies they run with an EEG, ECG, oximeter, etc.

Further, you can’t do one at home, and continually run sleep studies every night.

A Zeo, oximeter, and heartrate monitor could be used ot build a cheap polysomnograph. I have most of the hardware already but if you started from scratch you could build a decent one for like $750.

Not FDA approved by any means but once you’re diagnosed you can use this setup to test various sleep experiments (along with subjective quality of life measurements).

The biggest problem is that the oximeters are all targeted for active monitoring. They do no data recording.

It’s about $500 for one that does data recording and they don’t export to anything but a PC running Windows.

I’d love a way to hack something together to build a simple external data recorder.

One thing I want to play with is the ANT+ python module Kyle Machulis is writing in OpenYou

I can use this to avoid having to first upload my data to Garmin and THEN analyze the data after the event. It will make it much easier to analyze and I could in theory even analyze the data in real time.

This would give me heart rate data as well as blood oxygen. Of course, most of the oximeters also include pulse rate so I might not need to wear a strap any more.

I really hate it when mainstream media covers lame science.

This study recently done at the University of Pennsylvania is interesting but not really helpful.

For starters. It was VERY small…. dozens of subjects:

In what was the longest sleep-restriction study of its kind, Dinges and his lead author, Hans Van Dongen, assigned dozens of subjects to three different groups for their 2003 study: some slept four hours, others six hours and others, for the lucky control group, eight hours — for two weeks in the lab.

This is the longest study of its kind? For two weeks? I’m confused.

Then there is this infographic:

Screen shot 2011-04-17 at 4.08.12 PM.png

What was the source of this data? Self reported? I don’t even know where to being with the flaws with self reported assessment.

Not surprisingly, those who had eight hours of sleep hardly had any attention lapses and no cognitive declines over the 14 days of the study. What was interesting was that those in the four- and six-hour groups had P.V.T. results that declined steadily with almost each passing day.

So subjects that needed 8 hours of sleep performed poorly when constrained to less hours of sleep? And we’re surprised by these findings?

And of course the NY Times doesn’t link to the actual study nor does it appear that the study is online from a Google search.

Part of the problem is that research community still doesn’t publish online.

However, when the NY Times publishes articles with such poor quality I’m not exactly encouraged to pay for articles of such low quality.

Also, why are they writing a story about a study done in 2003? That’s 8 years ago!

The biggest problem I have with this article is that the core idea of sleep optimization is to get the same quality of life with but with less sleep.

Telling people to just cold turkey start sleeping less isn’t going to have reasonable results.

You might as well tell random people to start running a marathon and then act surprised when they hurt themselves.

I have some initial conclusions from my ~2 weeks of sleep experiments.

– A pitch black room does increase my perceived sleep quality. I remember my dreams more and seem to have more dreams in general.

– The blue blocker sunglasses don’t seem to yield any meaningful result for me. This might be because of my current level of caffeine.

– Sleeping alone doesn’t seem to impact my quality of sleep either way.

– Sleeping with the Zeo itself interrupts my sleep. A bit of a heisenbug …

– Measuring my heart rate while I sleep also interrupts my sleep. However, it yielded some interesting results. While I sleep, I can see 3-5 peaks where my heart rate will temporarily jump from 43bpm to 75bpm… I assume these coincided with REM dreaming. I can’t get my heart rate monitor and the Zeo to sync up without exporting the data and re-importing it into a new system.

Here’s a run from while I was sleeping the other night.

I can’t perform extensive analysis on this just yet as I haven’t exported all the points.

Screen shot 2011-04-17 at 3.58.54 PM.png

My next big change is to quit caffeine again (I’m down to only 30mg) and then try to sleep without caffeine and wake up when I feel rested. I think I can migrate to bi-phasic sleep where I have a 5-6 hour core and then a 20 minute nap in the afternoon.

Part of this is confused by my rigorous athletic training which requires sleep for recovery.

Wow. This is really slick. The new Sony Ericsson Xperia phones have Ant+ integrated directly into the phone hardware.

This is huge as it means that more support for ANT+ will hopefully be forthcoming and other vendors

 

I am pleased to announce that the much anticipated ANT API for Android has now been released. The Sony Ericsson Xperia™ X8, Xperia™ X10 mini and Xperia™ X10 mini pro will be among the first commercially available Android phones to support ANT. The good folks at Sony Ericsson have indicated that support will be added to more devices in the near future.  Applications will be able to utilize this API in the announced devices as well as in all future devices supporting ANT.

[From ANT API for Android™ released | Developer World]

 

_developerworld_files_2011_02_ANTplus_logo.jpg

Here’s the data I have so far on tracking down my ideal sleep patterns.

– Too much caffeine is bad. Causes me to feel horrible in the morning, over sleep, etc.
– Too little caffeine is bad too. Causes me to wake up in the middle of the night with the inability to get back to sleep.

Screen shot 2011-04-08 at 11.59.22 AM.png

I think the reasons why are identical. My body feels lethargic when I start to go through withdraw symptoms.

If I have too much, the withdraw symptoms cause me to oversleep and feel horrible.

If I have too little, my natural sleeping patterns emerge and cause me to wake up after only 5-6 hours of sleep.

The problem is that in the past 5-6 hasn’t been enough.

It turns out that about 40mg is the right dosage so I have a digital scale and weigh my caffeine pills at 130mg every morning (they have internal buffer which needs to be accounted for).

Yet more problems emerge.

I require more sleep when I’m training (cycling, weight lifting) more often. Right now I require 8-9 hours of sleep.

So here is my next major hypothesis that I want to test.

Go down to ZERO caffeine.

Try to sleep 4-6 hours at night.

Sleep 30 minutes in the afternoon when the team is out to lunch.

Wear a heart rate monitor while I sleep to keep track of my resting heart rate (RHR). I think that it will show elevated RHR while my body is recovering.

In the past I’ve been able to do this… I was sleeping 5 hours at night and waking up feeling rested, but the problem is that in the afternoon I felt tired.

A nap would solve this.

So in theory my sleep would go down from 9 hours per night, to 6 hours per night.

A three hour savings – or 1.5 months extra a year of waking time. Amazing.

This is pretty nice. Google released Zippy as Open Source:

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

This means that along with open-vcdiff it is possible to use the full Google compression tool chain.

This is my 3rd day sleeping in an environment which is 100% pitch black with no interruptions (no sound).

My bike rides have seen a marked improvement in their subjective ride quality. I also feel a bit more rested in the afternoons.

What’s more amazing are that my dreams are now amazingly vivid!

For the last three nights I’ve had very profound dreams. These really were inspiring and I found myself thinking about them well into the afternoon.

I haven’t dreamt like this in years!

If this continues, I’m going to restart the habit of keeping a dream journal (perhaps via audio).

Additionally, my rides have been improving. My subjective ride quality on my morning 15 mile ride was about an 8 out of 10 … which is really good as lately I’ve felt I’ve been overtraining.

I hypothesize that the quality of REM is significantly improving which explains the vibrant dreams and also explains the aided recovery in my fitness.

I was able to experiment with this by not replacing my blinds but instead just taping black trash bags over the window. It’s not pretty and only temporary. I’ll buy more expensive blinds in the future but this was a cheap $20 experiment.

I also found that I wasn’t able to fully black out the whole room and there are some spots which let in some mild light in the morning. Interestingly, the last 2 days I woke up just after sunrise so I think if I make sure no light is let in EVEN during sunrise then I can further optimize my sleep experience.

So far this has been full of WIN. Very excited to see what happens over the next few weeks.

My initial goals were to sleep less total hours but I’ll settle for this as it will also have a significant impact on my quality of life.

Another quick thought. The Zeo has turned out to be of no help. My ZQ scores are the same if not lower. On Sunday night I woke up 5 times. The Zeo says I only woke up once. My current thinking about the Zeo is that it’s an expensive pseudoscientific toy… Even gradual wake up function hasn’t helped me.

More cool JDK7 features.

Interesting review of some JDK 7 features:

I think the main one I like is the try-with-resource hack:

private String example() throws IOException {
        try(BufferedReader reader = new BufferedReader(...) {
             return reader.readLine();
        }
    }

One of my big goals has been to try to get my sleep down to 5 hours of sleep per night.

A few years ago I was able to hit a zen mode for about 1 month where I was only sleeping 5 hours per night.

I’ve since been trying to duplicate that.

The theory is that if I can REM more efficiently, and I’m not disturbed during my sleep, then I can compact my sleep into a smaller time range.

The impact on my life could immense with a significant amount of time added to my life per day/week/month/year.

Adding 1-2 hours a night is like is like 2-4 weeks of extra waking hours per year!

_api_images_zeo_device.jpg

So inspired by this post at the Quantified Self Boston meetup, I’ve decided to go heads down on another iteration of sleep optimization.

One of the things I’m struggling with is that due to my athletic training, sleep is required for repair of muscle tissue.

It’s hard to argue for shaving an hour off as my body just flat out needs the sleep (vs someone with a sedentary lifestyle).

I still think that I can shave some time off though.

Step 1, no interruptions

One of the major optimizations that I’m trying to make is reducing the number of trips to the bathroom at night by reducing water intake after 7pm.

So far I haven’t had much luck. I think I need at least 3-4 hours of no water intake before I go to sleep to allow me to sleep through the night without any interruptions.

The iPhone is now off for all alerts. The only thing that can interrupt me is work with a special reserved emergency phone number.

The cats are also downstairs in their room (which is now really nice and comfortable so I don’t feel guilty about keeping them there).

Step 2, NO light in the bedroom.

I made this change last night by blocking out all the light from my misc devices.

I also taped black trash bags to the windows to block out all light.

Long term (after a week) I plan on replacing the blinds with ones that block out all light if this turns out to be a major optimization.

This was my first night running with these changes.

So far the results have been really interesting.

My dreams last night were VERY vivid. I had a long dream about traveling to China, and losing my wallet. I also remember my dreams which is rare for me.

In my dream I took a flight to Singapore but they purchased my ticket wrong and somehow I ended up on a circular flight BACK to Singapore which took 34 hours. In my dream I slept the whole flight (I N C E P T I O N) and then woke at the remote end BACK in Singapore really pissed off and demanded a refund! (ha).

I also had a dream that I was fighting a war with monsters that had invaded earth. Some other alien civilization came and was providing us weapons to fight them. Yeah, I’m a scifi nerd.

Right after waking up I felt , ok… Felt very rested. It’s 1pm and I feel a bit lethargic though. I slept 8 hours so this might be an overtraining as I need to take my day off from training soon. I might listen to my body and just take a nap anyway.

Step 3, No blue light before I sleep

I’ve understood that this has been a problem with my body for some time. If I expose myself to too much light my body produces less melatonin and I have trouble falling asleep.

Apparently, there’s a hack which was discovered circa 2001 which is to just block out blue light.

I’ve already bought these glasses and will be running an experiment tonight to see what happens.

I rigorously track my food and fitness with as many tools as possible.

However, right now it’s a HORRIBLE mismash of technologies that don’t talk to each other.

Here’s what I currently use, or potentially use, and I want them ALL wirelessly communicating with each other without having to manually enter the data across websites.

Runkeeper

MyNetDiary

My Digital WiThings wifi scale (which I just bought but am not using yet)

– My Garmin heart rate monitor (FR60) and Garmin Connect

– Polar CS500 bike computer (haven’t bought this yet)

Zeo sleep monitoring system

– Garmin heart rate data uploaded to Runkeeper.

– My Omron blood pressure monitor.

My scale should broadcast my weight to all these systems.

All the data should be available to my doctor if necessary.

Most of these things require manual data uploads. This is HELL for me…. I hate it. It’s a non-starter now.

The Zeo, Garmin, etc. Don’t use them anymore. I’m starting to not use the Omron for my blood pressure (I have mild hypotension due to my athletic lifestyle which is annoying)… And the CS500 didn’t even get bought because it’s a non-starter for me to have to manually sync the data with a USB dongle’d computer.

I have a laptop. I’m not going to keep USB dongles in them all the time.

Runkeeper is kind of annoying. If it had a cadence sensor I could put it on my bike and have Runkeeper automatically start/stop my rides once it noticed the cadence censor data.

That would ROCK. Right now it’s somewhat annoying to have to click that stop/start button but I haven’t stopped yet. The support of a cadence sensor would fix that for me and I’d be in heaven with Runkeeper.

Either that or automatically start once they see my heart rate above a certain level. The problem is that right now I have to insert the Fisica utility which is annoying …

I’m about 2-4 years ahead of the tech curve here I think. This type of quantification is rare so I think I”m seeing some of the early adopter problem.

I’m a pioneer with arrows in his back.

My Quantified Self

I wanted to write this post as I’ve been addicted to quantifying my life since I started to become addicted to fitness over the last 18 months.

I wanted to drop a bunch of weight and get more fit as I became more addicted to mountain climbing and backpacking (and spending more time in Yosemite).

Why shave just 10 lbs off your pack when you can shave 20-50 lbs off your body.

If you think about it, even 5 lbs is amazingly annoying. If someone asked you to carry around 5 pounds of weight for a year you would look at them like they’re crazy. However, tons of people will put on 15 pounds slowly over the course of two years and not even think about it.

What do I log? Here are the list of apps that I use and have some what stuck to over the past couple years.

Runkeeper

I record ALL my rides with Runkeeper. I’m addicted to cycling. Here’s my ride from today.

201103191651.jpg

This is pretty much my normal ride. The calories burned is more like 550-600 though. Not sure why it’s off.

Here’s my profile:

201103191653.jpg

82 THOUSAND calories. 1640 miles.

By way of comparison, 82k calories is 45lbs of sugar!

Without Runkeeper I would have NO way to know that.

I love this app!

It really needs better integration with my heart rate monitor but for now this is ok.

MyNetDiary

I use MyNetDiary to track all of my food. It’s a great app.

I have a pretty strict diet. It’s easier than it sounds once you incorporate dietary help into your daily life.

I have a bunch of tricks to do this. I keep plenty of safe food at the office. I have additional tea in my bags. I often make my own food in bulk since this is easier than having to compute calories.

201103191656.jpg

This is a typical day with MyNetDiary.

I even control my caffeine intake. It turns out that with my ADD too much caffeine is bad but too little is also bad. 33mg is the ideal amount.

Every morning I eat pretty much the same thing. My protein shake is actually very delicious. It basically takes like a milk shake.

I’ve also recently added fruit/vege smoothies for lunch. They’re tremendous and only take 10 minutes to make at the office.

Zeo

I have mixed feeling about the Zeo. It has helped me optimize my sleep a bit but the strap is WAY to big and it doesn’t have wifi.

It’s also like 2 years old and they haven’t shipped an updated version.

IMO the company is over funded and mismanaged.

I hope they get their shit together.

Vitamins

Controlling my vitamins has also helped out a LOT … I’ve identified a number of performance related deficiencies. Potassium being one that I identified a long time ago and have long since corrected for.

iFitness

I weight train 3x per week using the Starting Strength system and log the data using iFitness.

I wish there was a web based version of this app or at least a way for me to upload the data onto the web.

Blood pressure

I have a home blood pressure monitor.

Because of my aggressive athletic training I have hypotension (low blood pressure) and it varies based on salt intake, water, etc.

I can use the blood pressure monitor to keep tabs on this so I can figure out when to have more water, salt, etc.

It also has a significant impact on over training detection. When I have low blood pressure it’s often because I’m training too hard and need to down a TON of water.

I have one at work and one at home so I can really quickly measure my blood pressure.

Future improvements

Everything needs wifi. The Zeo is useless without transparent sync. I’ve only ever uploaded the data twice.

Runkeeper is a NO brainer. I have NO problem pushing the button before my rides. They have all my data because it’s easy to upload.

I wish I had a a way to record heart rate with a simple / small device.

I wear my heart rate monitor while I bike but I’d like to wear one while I sleep so I could add this to the data about the quality of my sleep.

24/7 body temperature would be nice.

These signals can be used to test for overtraining , dietary issues, etc.

I think the next thing I’m going to buy is the withings scale which is a wifi scale that you can use to measure your weight.

I was recently at the Silicon Valley Cloud Computing meetup hosted by Facebook where Netflix was talking about their hosting nearly their entire cluster on Amazon Web Services. Specifically – SimpleDB.

Here’s what really would bother me if I was at Netflix… Amazon decides to compete with you heads on in my primary market.

Well that is what’s happening. Amazon is getting right into the video streaming space.

Why shouldn’t they? They want to do with DVDs what they did for books. Make them more accessible and easier to access. It’s their modus operandi.

So now your data provider is a direct competitor. Now what?

Should you just abandon your entire stack? Years of investment? Maybe. Probably. But now what? Where do you go?

Maybe they can go with Cassandra on RackSpace. It’s not going to be easy.

Many would say that this isn’t going to be a problem. They’re going to say that Amazon will be politically correct and not cut you off at the knees. After all, if they cut off one customer, Amazon is going to look bad and it’s going to hurt their hosting business (which is growing rather large).

But I’ll tell you what’s NOT going to happen.

Netflix won’t be able to collaborate with Amazon on large purchases regarding a launch. It will provide nice competitive intel to one of their main competitors.

Amazon now has tons of intel about their bandwidth, hardware stack, and database configuration. Amazon could in theory just flat out look at their entire database. I’m not saying that this would happen but it’s possible.

If Netflix were hosted at Rackspace it would put up a pretty significant wall that would prevent Amazon from spying.

Power corrupts.

Further, what happens if Netflix has a major outage and needs Amazon to step in and help. From time to time something major will happen and you need your hosting provider to step up and help.

We’ve done it with Softlayer and Serverbeach in the past at Spinn3r. We have some sort of very difficult problem and we work directly with our hosting provider to help us jump through hoops to fix it.

You think Amazon is going to be motivated to help one of their main competitors launch a new product? You think they’re going to push a SimpleDB fix to patch a production issue that Netflix sees? Maybe, but their interests aren’t 100% aligned and this is frightening.

Here’s more about Amazon launching Prime Instant Video:

 

We heard it was coming and now here it is. Amazon has flipped the switch on its “free” video streaming for Prime members, the service we’ve been hearing about for the past month or so. If you’ve already been taking advantage of subscription-based two-day shipping so that your impulse buys get to your door a little quicker you can now enjoy streaming of 5,000 pieces of “prime eligible” content, including some recent movies and a lot of TV shows, much of which will look awfully familiar if you’re also a Netflix subscriber.

[From Amazon launches Prime Instant Video, unlimited streaming for Prime subscribers — Engadget]

 

This was a colossal fail on the part of Twitter.

Even if you don’t think they did this on purpose, it was a huge PR error on their part.

Or a PR win on the part of Uber.

Either way Twitter lost and appears to come across as a bully.

The world doesn’t like bullies.

 

Here’s where Twitter is being snakey: “These violations include, but aren’t limited to”. Uh… you can’t start off with “we have simple rules” and then not be able to precisely and completely explain fully how a 3rd party violated them. That doesn’t scream forthrightness. That screams attorney-speak.

[From Why did Twitter suspend UberTwitter? – Quora]

 

Hadoop has hit a 4000 node scalability wall.

Most of us aren’t running into this wall but it’s interesting to see what it’s happening.

The elasticity of clouds make this more of a challenge. Most people don’t normally need 4000 noes but say you want to do something REALLY fast. I could see wanting to spool 8000 -10000 nodes to compute PageRank or a clustering algorithm and wanting it to finish FAST.

Having a limit on 4k nodes is a challenge.

 

Given observed trends in cluster sizes and workloads, the MapReduce JobTracker needs a drastic overhaul to address several deficiencies in its scalability, memory consumption, threading-model, reliability and performance. Over the last 5 years, we’ve done spot fixes, however lately these have come at an ever-growing cost as evinced by the increasing difficulty of making changes to the framework. The architectural deficiencies, and corrective measures, are both old and well understood – even as far back as late 200

[From The Next Generation of Apache Hadoop MapReduce · Yahoo! Hadoop Blog]

 

There’s a middle path here. You can go with someone like Softlayer or Rackspace and have your cake and eat it too.

Softlayer is a bit closer to being the cloud. We love them. Great company. Major partner for us… we’re going to be doubling down on servers this year and they’re going to get another big order from us.

This was the best decision I’ve made regarding Spinn3r I think. We gave this decision a lot of thought and were going to Colo but at the last minute I said felt that Colo was just a bad call and we went with Softlayer instead.

Win!

 

Facebook CTO Bret Taylor says buying servers was a mistake. A very big mistake. At the time, he was chief executive at FriendFeed, which eventually was sold to Facebook for the tidy sum of a reported $50 million. But these were the early days. He and his team needed to decide between buying servers or using Amazon Web Services. They bought the servers.

[From Facebook CTO Bret Taylor’s Biggest Mistake? Buying Servers – ReadWriteCloud]

 

This looks really cool… I wish there was a way to easily keep track of all the OSS projects that larger companies throw over the fence.

 

This Java library provides some useful building blocks to build
high-performance multi-threaded asynchronous applications in Java.
Its implementation was inspired by Twisted’s asynchronous library
(twisted.internet.defer).

Deferred allows you to easily build asynchronous processing chains
that must trigger when an asynchronous event (I/O, RPC and whatnot)
completes. It can be used extensively to build an asynchronous API
in a multi-threaded server or client library.

[From stumbleupon/async – GitHub]

 

« Previous PageNext Page »