John Battelle's Searchblog: Google's Web 2 Demo and the UI Plunge

SearchMob!

Search This Blog

PERFECT FOR THAT PERSON WITH EVERYTHING
Order 'The Search'

Yup, it makes the perfect gift for that officemate or colleague who you thought had everything....including you! If you order here, I promise to sign it, assuming we can figure out the shipping...

You can also buy the audio version here.

Check my book page for more info.

Blogger's Rights

Support Blogger's Rights!

Active Topics

25 comments: Conversational Marketing: PGM v. CM, Part 3 (03.09)
24 comments: A Modest Proposal To YHOO and MSFT: Spin Out A Search Company (03.13)
21 comments: Microsoft Deal For Large Customers: Use Live Search, Get Free MSFT Products (03.15)
12 comments: Ballmer On Google - Uh Oh (03.18)
11 comments: Meanwhile, A MSFT v. Google Battle Brews... (03.06)

Monthly Archives

About John Battelle

Searchblog Newsletter

Enter email to subscribe to "Re-Find", Searchblog's weekly newsletter:

Calendar

April 2007
Su	Mo	Tu	We	Th	Fr	Sa
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Syndicate

Full text feed

Excerpts Only

Powered by

Movable Type 3.34

October 12, 2004 10:31 AM

Google's Web 2 Demo and the UI Plunge

As many have already noted, last week at Web 2.0 Peter Norvig, Google director of search quality, demonstrated word clustering, "named entities," and machine translation technology to the audience. The translation software was impressive, but somehow lacked zing - "good enough" translation doesn't seem like much of a revelation anymore. That in itself is an extraordinary achievement - Norvig showed translations from Arabic and Chinese - both significantly distinct languages compared to English. Google already has translation features built into its engine (from a third party), but this hand-rolled stuff was far more powerful, it seemed to me.

In any case, the demos that really got the audience going (and me, to be honest) was the named entities and the clustering technology. Seeing anything behind the veil of Google's real research and development is of course a revelation, but seeing something that was so clearly ready for prime time felt rather close to a declaration of where Google is heading, in particular given the recent moves in the personalization and clustering space from Amazon, Ask, Vivisimo, and Yahoo.

"Named entity extraction" is a relatively new project called which Norvig said Google had been working on for about six months. As Norvig explained the concept - essentially identifying semantically important concepts and the meaning wrapped around them - I couldn't help but think of WebFountain and my wish (near the end of the post) that Google would add a bit of IBM's semantic peanut butter into its PageRank chocolate.

Norvig also showed an entertaining (and live) demo of clustering, which he claimed was the "largest bayesian database of clusters" extant. Hmmm.

From the eWeek story covering the news:

For example, Norvig said, researchers are looking for ways to break down sentences by looking for a phrase like "such as" and grabbing the names that follow it. The goal is to not only pull out the name but also its clusters, so that a name such as "Java" can be associated both with the computer language and with language in general, Norvig said.

"We want to be able to search and find these [entities] and the relationships between them, rather than you typing in the words specifically," Norvig said.

This has potentially interesting implications in next-generation ranking methodologies, for one, but combined with clustering, it signals that Google is serious about taking what one might call the UI plunge.

What do I mean by that? Well, of all the major engines, only Google has strictly maintained what might be called the C prompt interface to search: put in yer command, get out yer list of results (Google Local is a departure, but it's still in beta). Yahoo, Ask, A9 and others have begun to twiddle in pretty significant ways with evolved interfaces which - by employing your search history, your personal data, clustering, and other tricks - deliver more filtered and intentional results (though it is still arguable if they are more relevant). I sense it's only a matter of time before Google takes this approach as well, and Norvig's demo certainly points that way. After all, it's not that often Google decides to give us a glimpse behind the curtain, and coupled with Google Board member John Doerr's semi-announcement the day before (he told the audience that Google would become "the Google that knows you") I think the UI plunge might come sooner than we all expect.

If you want to know more about how Google is thinking about clustering, here's a paper written by a Google team, courtesy of a link from Don Park.

Update: Lazy linking on my part, the clustering paper is about hardwaree (though it is really interesting...)

Posted by John Battelle on October 12, 2004 10:31 AM
remember this »
Sphere It

TrackBack

Listed below are links to weblogs that reference Google's Web 2 Demo and the UI Plunge:

» Named entity extraction from alex wright
John Battelle describes an intriguing glimpse of Google's forthcoming search clustering technology, from last week's Web 2.0 conference: "Named entity extraction" is a relatively new project called which Norvig said Google had been working on for about... [Read More]

Tracked on October 12, 2004 12:37 PM

» Named entity extraction from Stone
Google continues to doreally cool stuff.... [Read More]

Tracked on October 13, 2004 12:39 PM

» Web Search Clustering from Microsoft (and other Clustering Tools) from Search Engine Watch Blog
Yesterday, we blogged about a discovery that allows you to receive MSN Search Beta results via RSS. It will be interesting to see what Bill G. and company does with this feature in the future. Today, even more MS search news. An excellent post on the S... [Read More]

Tracked on June 6, 2005 6:10 AM

Comments

IMHO, I think that the 'The Google Cluster Architecture' PDF document you link has nothing to do with the technology Peter Norvig explained.

The document explains how the X,000 Google servers are clustered in order to run quickly and help with users' queries.

And what Peter explained on 'Web 2.0' was how Google plans to cluster search results by learning the meaning of the web pages.

Posted by: Dirson
October 12, 2004 11:00 AM

You're right! Sorry about that. I should not rushlink, as I did to that page. My bad.

Posted by: John Battelle
October 12, 2004 11:09 AM

Never mind.

Your mistake is a good sample of the use of clustering (one word, several meanings). Perhaps you used Google to search "Google+cluster", and the first result is the document you linked. But it wasn't the "cluster" you looked for.

Posted by: Dirson
October 12, 2004 11:29 AM

It's good to see some technical hints from Google, although not much to go on.

Machine Translation - destined to be damned with faint praise. Chinese isn't hard to translate; most of the relative difficulty of the language lies in learning the characters. Japanese and Korean have a much more unique grammatical structure. No idea on Arabic.

Named Entity Extraction (as opposed to unnamed entities? nouns?) - This sounds like Google Sets. IIRC they were scanning text for list-type noun phrases (eg North, South, East and West), and building up associations from them.

Clustering - sounds like they've rediscovered data mining. It's not clear what algorithm they're using, but I can think of a few that might work with a large inverted index.

Posted by: Kendall Willets
October 12, 2004 3:28 PM

There's a new paper from Google in OSDI 2004

MapReduce: : Simplified Data Processing on Large Clusters
http://www.usenix.org/events/osdi04/tech/dean.html
http://people.cs.vt.edu/~gback/MapReduce.pdf

Posted by: Rasta
October 13, 2004 8:01 PM

I think that Googl's new context translation is a great thing.

Posted by: Helen
November 21, 2005 1:12 PM

Searchblog Classifieds!

Recent Jobs

View All Jobs

Post a Job

Get your job site
at SimplyHired.com

Searchblog, in paperback

Searchblog
Print Edition

Get Your Own Print Version of Searchblog

Click here to buy a customized print version of the entire contents of Searchblog.

Search Resources

License

This work is licensed under a Creative Commons Attribution- NonCommercial- NoDerivs 2.5 License.