SearchMob!


Powered by Rollyo

Recent Comment
Spotlight

  • Reader JG writes: ... YouTube ads like this fly in the face of everything "relevance" based ...(it) is a complete reversal of everything [Google] ever stood for. A non-relevance-based graphical video overlay? How is that not just a banner ad? And wasn't the whole fire and fury behind Google's rise, Google's takeover of the net, founded on a rejection of the "banner", the DoubleClickian "gaudy and irrelevant", approach to web advertising? [go]

Recent Comments

  • Tim: " When you say "book reading", are you hin ..." [go]
  • Mike Glanz: " Great news! - Just signed up for HireAHe ..." [go]
  • Toner Druckerpatronen: " Great article. I think the most save met ..." [go]
  • tophatsolutions: " redirections or it could be a bug than b ..." [go]
  • Vendetta: " muaaah! Boohoo... Poor Large media compa ..." [go]
  • Mitesh: " google and yahoo are best. MSN comes aft ..." [go]
  • Search ☸Engines ☸Web: " All of those that HATE Google now, shoul ..." [go]
  • BolcaSohbet.NET: " Thanksss ..." [go]
  • Marc Burch: " I have noticed that the current term she ..." [go]
  • nmw: " "Google's share of web searches must rem ..." [go]
  • Shawn: " What is the site submission deadline? ..." [go]
  • gosia: " So it's fun to note that John. ..." [go]
  • Filmiki: " Very good article, and very informative! ..." [go]
  • Search ☸Engines ☸Web: " Also, we are not charging a fee to ..." [go]
  • Stone: " This is incredibly sad. I'm not big fan ..." [go]
  • islam: " thank you ..." [go]

PERFECT FOR THAT PERSON WITH EVERYTHING
Order 'The Search'

thesearch_bookcover.jpg

Yup, it makes the perfect gift for that officemate or colleague who you thought had everything....including you! If you order here, I promise to sign it, assuming we can figure out the shipping...

You can also buy the audio version here.

Check my book page for more info.

Blogger's Rights

Top Posts

Active Topics

Monthly Archives

About John Battelle

Searchblog Newsletter

Enter email to subscribe to "Re-Find", Searchblog's weekly newsletter:


Calendar

September 2007
Su Mo Tu We Th Fr Sa
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

Syndicate

Powered by

August 4, 2007 9:41 PM

Hadoop

Hadoop-Logo
I must have been under a rock, because I missed the news that Doug Cutting (of Lucene and Nutch fame) is now at Yahoo, and working on supporting Hadoop, which is "a software platform lets one easily write and run applications that process vast amounts of data."

Tim covers this well, writing:

...why is Yahoo!'s involvement so important? First, it indicates a kind of competitive tipping point in Web 2.0, where a large company that is a strong #2 in a space (search) realizes that open source is a great competitive weapon against their dominant competitor. It's very much the same reason why IBM got behind Eclipse, as a way of getting competitive advantage against Sun in the Java market. (If you thought they were doing it out of the goodness of their hearts rather than clear-sighted business logic, think again.) If Yahoo! is realizing that open source is an important part of their competitive strategy, you can be sure that other big Web 2.0 companies will follow.


Comments

Maybe you've been under a rock for quite a long time :-)
http://jeremy.zawodny.com/blog/archives/006471.html
Jean-Marie

You may also be interested in projects that are building on top of Hadoop. See my blog post "Hadoop gaining momentum" to see references on machine learning projects using MapReduce, SQL-like relational semantics on top of Hadoop, etc.

This isn't an Open Source story, it's an infrastructure story. Yahoo's IT infrastructure is made up of lots of different smaller systems. But Google is increasingly moving everything over to having a single GFS file system which uses Map Reduce to run jobs, and BigTable running on top which can store pretty much any kind of data you can think of.

Robin Harris at Storage Mojo believes that if Yahoo moves over to a Google-like infrastructure, they could save 30-40% of their IT costs, and cut as many as 4,000 jobs.

http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/

As you know, Hadoop, Hbase and MapReduce are the Apache clones of GFS, BigTable and MapReduce respectively.

I've heard that Yahoo is already running a 1,000 node Hadoop cluster. So it makes sense that they believe that the Apache suite might be able to do for them, what Google's infrastructure has done for Google which is to give them a single heterogeneous storage system which runs on hundreds of thousands of cheap commodity server which you can then quickly and easily build products and services on top of across the entire company.

Now, if they were really smart, they would then take this to the next level to create a next generation infrastructure which would allow them to leapfrog Google instead of simply catching up to them.

That's what my company is working on and Yahoo would be stupid not to be doing as well.

Btw, speaking of Nutch, the first two massive WebHarvest.gov (terabytes of permanently archived government data) can be keyword searched using Nutch. WebHarvest is a project from the Internet Archive and the National Archives.
http://www.webharvest.gov

We also learned last week that Lucene will be used at Wikia.

Post a comment

Human detector
Please enter the letter "r" in the field below. If you want to preview your comment before posting, enter the secret letter after previewing, not now, as the letter will change upon preview.

Enter the letter from above:

Searchblog Classifieds!

Recent Jobs

Searchblog, in paperback

Searchblog
Print Edition

Get Your Own Print Version of Searchblog

Get the book

Click here to buy a customized print version of the entire contents of Searchblog.

Categories

Search Resources

License