SearchMob!
Recent Comment
Spotlight
- Reader JG writes: ... YouTube ads like this fly in the face of everything "relevance" based ...(it) is a complete reversal of everything [Google] ever stood for. A non-relevance-based graphical video overlay? How is that not just a banner ad? And wasn't the whole fire and fury behind Google's rise, Google's takeover of the net, founded on a rejection of the "banner", the DoubleClickian "gaudy and irrelevant", approach to web advertising? [go]
Recent Comments
- Tim: " When you say "book reading", are you hin ..." [go]
- Mike Glanz: " Great news! - Just signed up for HireAHe ..." [go]
- Toner Druckerpatronen: " Great article. I think the most save met ..." [go]
- tophatsolutions: " redirections or it could be a bug than b ..." [go]
- Vendetta: " muaaah! Boohoo... Poor Large media compa ..." [go]
- Mitesh: " google and yahoo are best. MSN comes aft ..." [go]
- Search ☸Engines ☸Web: " All of those that HATE Google now, shoul ..." [go]
- BolcaSohbet.NET: " Thanksss ..." [go]
- Marc Burch: " I have noticed that the current term she ..." [go]
- nmw: " "Google's share of web searches must rem ..." [go]
- Shawn: " What is the site submission deadline? ..." [go]
- gosia: " So it's fun to note that John. ..." [go]
- Filmiki: " Very good article, and very informative! ..." [go]
- Search ☸Engines ☸Web: " Also, we are not charging a fee to ..." [go]
- Stone: " This is incredibly sad. I'm not big fan ..." [go]
- islam: " thank you ..." [go]
PERFECT FOR THAT PERSON WITH EVERYTHING
Order 'The Search'
Yup, it makes the perfect gift for that officemate or colleague who you thought had everything....including you! If you order here, I promise to sign it, assuming we can figure out the shipping...
You can also buy the audio version here.
Check my book page for more info.
Blogger's Rights
Top Posts
- The Database of Intentions (or how this all got started)
- From Pull to Point(or the first post where I riff on the "Point-To Economy")
- Google As Builder (or the point at which Google stopped being simply a search engine)
- On Google v. Yahoo
- TV and Search Merge
- On Sell Side Advertising
- Battelle Gets Searchstreams
- Search and Immortality
- Toward the Endemic (on endemic advertising)
More coming soon...
Active Topics
- 19 comments: Mayer At SES: Google Mobile Bump (08.25)
- 9 comments: Driving on the Vineyard (08.12)
- 8 comments: Commodity Computing (08.08)
- 7 comments: The Web's End (08.23)
- 6 comments: Accoona IPO (08.06)
Monthly Archives
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
- December 2003
- November 2003
- October 2003
About John Battelle
Searchblog Newsletter
Enter email to subscribe to "Re-Find", Searchblog's weekly newsletter:
Calendar
| Su | Mo | Tu | We | Th | Fr | Sa |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 |
Syndicate
Powered by
August 4, 2007 9:41 PM
Hadoop
I must have been under a rock, because I missed the news that Doug Cutting (of Lucene and Nutch fame) is now at Yahoo, and working on supporting Hadoop, which is "a software platform lets one easily write and run applications that process vast amounts of data."
Tim covers this well, writing:
...why is Yahoo!'s involvement so important? First, it indicates a kind of competitive tipping point in Web 2.0, where a large company that is a strong #2 in a space (search) realizes that open source is a great competitive weapon against their dominant competitor. It's very much the same reason why IBM got behind Eclipse, as a way of getting competitive advantage against Sun in the Java market. (If you thought they were doing it out of the goodness of their hearts rather than clear-sighted business logic, think again.) If Yahoo! is realizing that open source is an important part of their competitive strategy, you can be sure that other big Web 2.0 companies will follow.
- Posted by John Battelle on August 4, 2007 9:41 PM
remember this »- Sphere It
Searchblog Classifieds!
Recent Jobs
Searchblog, in paperback
Searchblog
Print Edition
Get Your Own Print Version of Searchblog
Click here to buy a customized print version of the entire contents of Searchblog.



Comments
Maybe you've been under a rock for quite a long time :-)
http://jeremy.zawodny.com/blog/archives/006471.html
Jean-Marie
You may also be interested in projects that are building on top of Hadoop. See my blog post "Hadoop gaining momentum" to see references on machine learning projects using MapReduce, SQL-like relational semantics on top of Hadoop, etc.
This isn't an Open Source story, it's an infrastructure story. Yahoo's IT infrastructure is made up of lots of different smaller systems. But Google is increasingly moving everything over to having a single GFS file system which uses Map Reduce to run jobs, and BigTable running on top which can store pretty much any kind of data you can think of.
Robin Harris at Storage Mojo believes that if Yahoo moves over to a Google-like infrastructure, they could save 30-40% of their IT costs, and cut as many as 4,000 jobs.
http://storagemojo.com/2007/07/05/how-yahoo-can-beat-google/
As you know, Hadoop, Hbase and MapReduce are the Apache clones of GFS, BigTable and MapReduce respectively.
I've heard that Yahoo is already running a 1,000 node Hadoop cluster. So it makes sense that they believe that the Apache suite might be able to do for them, what Google's infrastructure has done for Google which is to give them a single heterogeneous storage system which runs on hundreds of thousands of cheap commodity server which you can then quickly and easily build products and services on top of across the entire company.
Now, if they were really smart, they would then take this to the next level to create a next generation infrastructure which would allow them to leapfrog Google instead of simply catching up to them.
That's what my company is working on and Yahoo would be stupid not to be doing as well.
Btw, speaking of Nutch, the first two massive WebHarvest.gov (terabytes of permanently archived government data) can be keyword searched using Nutch. WebHarvest is a project from the Internet Archive and the National Archives.
http://www.webharvest.gov
We also learned last week that Lucene will be used at Wikia.
Post a comment