Geeking with Greg

Friday, September 21, 2007

Hotmap and map attention data

Hotmap is a fun demo that shows a heat map of what areas people are viewing frequently on Microsoft Live Search Maps.

From the About page:

Hotmap shows where people have looked at when using Virtual Earth, the engine that powers Live Search Maps: the darker a point, the more times it has been downloaded.

It is a pretty cool idea. The heat maps clearly focus on high population areas, roads, coastlines, rivers, country borders, and other items of interest.

Seems like there could be a bunch of unusual and useful applications here, especially if you take into account time series (that people look at some map tiles after looking at other map tiles).

Danyel Fisher at Microsoft Research has two articles on the project: "Hotmap: Looking at Geographic Attention" and "How We Watch the City: Popularity and Online Maps". The articles are light reads, plenty of pretty pictures of heat maps.

As the second of the two papers mentions, there are some amusing examples of where people direct their attention. Here in Seattle, a "small, very bright point on the shore of Lake Washington points out Bill Gates' house."

For related work that uses GPS log data rather than map search log data, also make sure to check out "The Microsoft Multiperson Location Survey" and "Predestination: Inferring Destinations from Partial Trajectories".

Update: More on Hotmap from Matthew Hurst, Todd Bishop, and directly from Danyel Fisher.

E-mail versus social networks

Insightful thoughts from Om Malik:

E-mail has most ... of our attention .... [and] has all the elements needed for a social ecosystem, namely the address book.

Yahoo might have taken the wrong approach to ... social networking ... It should have started from within Yahoo's email service, which has some 250 million subscribers.

[E-mail should be] something better, something that doesn't make us all groan every time we open our inbox.

It probably is not fair to pick on Yahoo here. Microsoft and Google also seem to have had only limited success with their social network apps while letting their e-mail apps languish.

But, I think Om has an excellent point. Rather than try to replace e-mail apps with social apps, most people might be better served by bringing more social features to our e-mail apps.

On this topic, you might be interested in checking out "Inner circle: people centered email client", a fun CHI 2005 paper out of Microsoft Research.

Tuesday, September 18, 2007

Google's PowerPoint launches

Google launches Google Presentations, lightweight PowerPoint-like functionality integrated into Google Docs.

Philipp Lenssen posts a lengthy review. More details are available from the Google Help pages.

As you might expect in an online app, the focus appears to be on collaboration, sharing, and virtual conferencing (using chat and synchronized online viewing of the presentation).

Stepping back and looking at the bigger picture here, I find myself getting to the point where my entire day is spent in the browser. Even on machines where I have Microsoft Office installed, I often find it faster to quickly view documents using the GMail integration with Google Docs than open other applications.

I was skeptical that Google would get us to that point, but they have. Google appears to be making remarkable progress chipping away at the utility of a desktop PC environment.

Monday, September 17, 2007

The Google phone company

Bob Cringely's latest column proposes that Google spend several billion to buy the 700 MHz band, sell "Google Cubes" that act as small fileservers, WiFi points, and the mesh of a 700 MHz network, and then "overnight" become the "biggest and lowest-cost ISP" and the "biggest and lowest-cost mobile phone company" while "dominating local- and location-based search".

Bob is thinking big today, it appears.

Friday, September 14, 2007

Tech talk: Searching for Evil

Security research guru Ross Anderson has a talk up on Google Video, "Searching for Evil", that, among other things, surveys some of the more unusual Web-based financial schemes.

If you only have a few minutes, jump to 20:23 to check out Ross' frightening examples of some phishing-like schemes that are popping up on the web. The first example shows how people recruit mules on the Web to sit in the middle of a fraudulent financial transaction, with the person who accepted a too-good-to-be-true job offer getting badly screwed in the end.

If you have more time to dive in deeper and watch the whole thing, I enjoyed Ross' discussion at the beginning of the talk about using evolutionary game theory in simulations of network attacks. He refers to a WEIS 2006 paper, "The topology of covert conflict" (PDF), for more details. That paper starts to "build a bridge between network science and evolutionary game theory" and to "explore ... sophisticated [network] defensive strategies" including "cliques ... the cell structure often used in revolutionary warfare" which turn out to be "remarkably effective" for defending a network against adaptive attackers.

Similarly, though not mentioned in his talk, Ross has a ESAS 2007 paper, "New Strategies for Revocation in Ad-Hoc Networks" (PDF) which looks at how to "remove nodes that are observed to be behaving badly" from ad-hoc networks. They come up with a remarkable conclusion that "the most effective way of doing revocation in general ad-hoc networks is the suicide attack ... [where] a node observing another node behaving badly simply broadcasts a signed message declaring both of them to be dead."

Wednesday, September 12, 2007

Netflix prize and the value of experimenting

The Netflix Prize leaderboard continues to be a fascinating proof of the value of experimentation when working with big data.

The top entries include teams of graduate students from around the world, with eastern Europe particularly well represented. The second best entry at the moment is from undergraduates from Princeton (kudos, Dinosaur Planet).

Some of the teams disclose information about their solutions, enough to make it clear that the teams are playing with a wide variety of techniques.

I love the "King of the Hill" approach to these kinds of problems. There should be no sacred cows, no egos preventing people from trying and testing new techniques. From the seasoned researcher to the summer intern, anyone should be able to try their hand at the problem and build on what works.

Please also see my July 2007 post, "Netflix Prize enabling recommender research", and my June 2007 post, "Latest on the Netflix Prize".

See also my April 2006 post, "Early Amazon: Shopping cart recommendations", for an example from the early days of Amazon of the value of A/B testing and experimentation.

Tuesday, September 11, 2007

Leaked information on Google Reader

According to Philipp Lenssen, an internal Google talk with confidential information on Google Reader was briefly available on Google Video.

Philipp posts a summary from someone referred to as "Fanboy". Ionut Alex Chitu also posts two summaries ([1] [2]) of the content of the talk. Worth a look.

There is mention of planned social sharing features, details on the internal operations of Google Reader, and various statistics on feeds and feed reading. It also sounds like they plan on launching feed recommendations soon.

Impressive that the team working on Google Reader is so small, just seven people.

Actively learning to rank

Filip Radlinski and Thorsten Joachims had a paper at KDD 2007, "Active Exploration for Learning Rankings from Clickthrough Data" (PDF), with a good discussion of strategies for experimenting with changes to the search results to maximize a search engine's ability to learn from clickstream data.

Some excerpts:

[When] learning rankings of documents from search engine logs .... all previous work has only used logs collected passively, simply using the recorded interactions that take place anyway. We instead propose techniques to guide users so as to provide more useful training data for a learning search engine.

[With] passively collected data ... users very rarely evaluate results beyond the first page, so the data obtained is strongly biased toward documents already ranked highly. Highly relevant results that are not initially ranked highly may never be observed and evaluated.

One possibility would be to intentionally present unevaluated results in the top few positions, aiming to collect more feedback on them. However, such an ad-hoc approach is unlikely to be useful in the long run and would hurt user satisfaction.

We instead introduce ... changes ... [designed to] not substantially reduce the quality of the ranking shown to users, produce much more informative training data and quickly lead to higher quality rankings being shown to users.

The strategy they propose is to come up with some rough estimate of the cost of ranking incorrectly, then twiddle with the search results in such a way that the data produced will help us minimize that cost.

There are a bunch of questions raised by the paper that could use further discussion: Is the loss function proposed a good one (in particular, with how it deals with lack of data)? How do other loss functions perform on real data? How much computation does the proposed method require to determine which experiment to run? Are there simpler strategies that require less online computation (while the searcher is waiting) that perform nearly as well on real data?

But, such quibbles are beside the point. The interesting thing about this paper is the suggestion of learning from clickstream data, not just passively from what people do, but also actively by changing what people see depending on what we need to learn. The system should explore the data, constantly looking for whether what it believes to be true actually is true, constantly looking for improvements.

On a broader point, this paper appears to be part of an ongoing trend in search relevance rank away from link and text analysis and toward analysis of searcher behavior. Rather than trying to get computers to understand the content and whether it is useful, we watch people who read the content and look at whether they found it useful.

People are great at reading web pages and figuring out which ones are useful to them. Computers are bad at that. But, people do not have time to compile all the pages they found useful and share that information with billions of others. Computers are great at that. Let computers be computers and people be people. Crowds find the wisdom on the web. Computers surface that wisdom.

See also my June 2007 post, "The perils of tweaking Google by hand", where I discussed treating every search query as an experiment where results are frequently twiddled, predictions made on the impact of those changes, and unexpected outcomes result in new optimizations.

Improving Amazon?

I want to surface a new comment thread on improving online shopping from one of my old posts. I would enjoy hearing other thoughts on it, so please comment if you have anything to add.

Chris Zaharias asked:

Where you think Amazon still has opportunities to improve their game?

I said:

Looking at the bigger picture, I think it is hard to say that Amazon is anywhere close to done. The experience of shopping at Amazon is hardly effortless, full of discovery, or even all that pleasant.

Going to Amazon should be like walking into your favorite store, the nearest shelves piled high with things you like, everything you don't need fading into the background.

When you walk up to an item, everything you need to quickly evaluate it and decide whether to buy it should float to your attention.

Buying should be effortless, a couple clicks at most, with no unpleasant surprises (such as hidden shipping charges, delays, or belated out of stock e-mails).

Amazon has taken some steps toward that vision, but is a long way from there.

What do you think? How do you think online shopping be improved? Are there things Amazon should be doing that they are not?

Monday, September 10, 2007

Learning forgiving hashes

Googlers Shumeet Baluja and Michele Covell had a paper at IJCAI 2007, "Learning 'Forgiving' Hash Functions: Algorithms and Large Scale Tests" (PDF).

The paper describes finding similar songs using very short audio snippets from the songs, a potential step toward building a music recommendation system. The similarity algorithm used is described as a variant of locality-sensitive hashing.

First, an excerpt from the paper describing LSH:

The general idea is to partition the feature vectors into subvectors and to hash each point into separate hash tables... Neighbors can be [found by] ... each hash casting votes for the entries of its indexed bin, and retaining the candidates that receive some minimum number of votes.

What is interesting (and a bit odd) about this work is that they used neural networks to learn the hash functions for LSH:

Our goal is to create a hash function that also groups "similar" points in the same bin, where similar is defined by the task. We call this a forgiving hash function in that it forgives differences that are small.

We train a neural network to take as input an audio spectrogram and to output a bin location where similar audio spectrograms will be hashed.

A curious detail is that initializing the training data by picking output bins randomly worked poorly, so instead they gradually change the output bins over time, allowing them to drift together.

The primary difficulty in training arises in finding suitable target outputs for each network.

Every snippet of a song is labeled with the same target output. The target output for each song is assigned randomly.

The drawback of this randomized target assignment is that different songs that sound similar may have entirely different output representations (large Hamming distance between their target outputs). If we force the network to learn these artificial distinctions, we may hinder or entirely prevent the network from being able to correctly perform the mapping.

Instead of statically assigning the outputs, the target outputs shift throughout training ... We ... dynamically reassign the target outputs for each song to the [bin] ... that is closest to the network's response, aggregated over that song's snippets.

By letting the network adapt its outputs in this manner, the outputs across training examples can be effectively reordered to avoid forcing artificial distinctions .... Without reordering ... performance was barely above random.

I have not quite been able to make up my mind about whether this is clever or a hack. On the one hand, reinitializing the target outputs to eliminate biases introduced by random initialization seems like a clever idea, perhaps even one that might have broad applicability. On the other hand, it seems like their learning model has a problem in that it does not automatically learn to shift the outputs from their initial settings, and the reordering step seems like a hack to force it to do so.

In the end, I leave this paper confused. Is this a good approach? Or are there ways to solve this problem more directly?

Perhaps part of my confusion is my lack of understanding of what the authors are using for their similarity metric. It never appears to be explicitly stated. Is it that songs should be considered similar if the difference between their snippets is small? If so, is it clear that is what the NNet is learning?

Moreover, as much as I want to like the authors idea, the evaluation only compares their approach to LSH, not to other classifiers. If the goal is to minimize the differences between snippets in the same bin while also minimizing the number of bins used for snippets of any given song, are there better tools to use for that task?

Universal action and the future of the desktop

In a Google engEdu tech talk, "Quicksilver: Universal Access and Action", Nicholas Jitkoff provides some thought-provoking ideas on the future of the desktop.

Starting at 03:52 in the talk, Nicholas begins describing how he thinks the desktop should work, listing four categories of user goals: Search, Summon, Browse, and Act.

Search is a fast, comprehensive, and easy-to-use desktop search tool, such as Google Desktop. This may not sound new but, amazingly, this has only recently begun to be a common part of the desktop experience.

Summon is using desktop search for navigation. You know something exists, you just want to get back to it immediately.

Browse may sound close to what the desktop does now, but Nicholas seems to mean browse not as navigating a folder hierarchy but as finding objects related to or near other objects. For example, you might not remember the name of the specific song you want, but you might be able to remember the artist who wrote it; getting to the first allows you to recall the second.

Act is when you want to immediately do a task (e.g. play a music track) without any intervening steps such as opening an application.

Note the deemphasis of the traditional file hierarchy, the focus on objects, and the shift away from applications and toward actions on objects.

The desktop should seek to satisfy our goals immediately. We should not have to start to adjust the lighting in a photo album by navigating an hierarchical menu, locating an application that allows you to edit photos, waiting for the application to load, and then opening the files using the open menu in that application. We should just ask to adjust the lightening in a photo album.

The next few minutes of the talk further break down some of the constraints on the desktop metaphor. Nicholas advocates fast, universal access that ignores the boundaries of the machine, reaching out to the network to whatever data and code is needed to act. The focus should be on the task -- getting work done -- using whatever resources are necessary, requiring as little effort as possible.

The vision is fantastic and inspiring. However, while Quicksilver is an interesting example, from what I saw, it appears to be only a baby step toward these lofty goals. The learning and automation appears primitive, and the effort required to customize severe, which may make Quicksilver closer to a geek tool than a realization of the broader ambition.

Even so, Nicholas is offering intriguing thoughts on where the desktop should go. It is well worth listening.

Friday, September 07, 2007

More interviews on search in 2010

Gord Hotchkiss at Search Engine Land continues his interviews on the future of search with his post, "Search In The Year 2010: Part Two".

Again, I would recommend reading the whole thing, but I here will focus on the parts on personalization.

In particular, there are a few tidbits on personalized advertising in this round of interviews. Some excerpts:

Chris Sherman: ... As they get to know you and your preferences, you know... "I never click on that video ad," they’ll gradually stop showing you [those] ads ... and maybe increase the ads ... that you do click on.

Larry Cornett: ... The more they understand about what a specific user is looking for in their context, the more intelligent they can be about what they're actually offering ... By being more targeted it will add more value for the users and hopefully, be a better experience for them as well .... Do [users] really want to spend time in the context where they're seeing a lot of stuff that’s not targeted and not appropriate and might even be annoying or would they rather ... [see ads that] could be beneficial for them.

[Gord Hotchkiss:] Personalization of advertising will happen incrementally and the ability to target accurately will improve over time. For many users, it will be a mixed environment, with some very well targeted, relevant ads in some locations that don’t even look like advertising and the more typical forms of untargeted advertising we're more familiar with.

The impression I got from this is that personalized advertising is now seen as inevitable. Privacy concerns may make it appear incrementally, but most seem to agree that it will happen.

On a different topic, usability guru Jakob Nielsen used his time to promote NLP over personalization and pick on Amazon.com's recommendations yet again. Gord asked me to respond.

On the one hand, I agree with Jakob about the long-term promise of natural language techniques (though I think he may be underestimating the challenges and overestimating the likelihood of rapid progress there) and his criticism of inaccuracies in personalization and recommendations (and they are inaccurate, no doubt).

On the other hand, I think Jakob is using an absolute measure of the effectiveness of personalization where a relative measure is more appropriate. Specifically, the metric should not be how often does personalized content accurately reflect your interests; it should be how much better does personalized content predict your interests than whatever unpersonalized content you otherwise would have to put in the space.

That is a much lower bar. Bestsellers and other unpersonalized content tend to be very poor predictors of individual interest. By knowing even a little bit about you, it is easy to do better.

Is personalization ever going to be perfect? No, but it does not have to be. It just has to be more useful than the alternative. Personalized content only has to be marginally more interesting than unpersonalized content to be helpful.

See also Gord's first post in this series, "Search In The Year 2010", which has some more on personalized search, and my comments on that post.

For more on personalized advertising, please see also my posts "What to advertise when there is no commercial intent?" and "Is personalized advertising evil?"

Wednesday, September 05, 2007

The power of branding in web search

Some research out of Penn State, "The Effect of Brand Awareness on the Evaluation of Search Engine Results" (PDF) puts some hard numbers on the hurdles for Google's web search competitors.

The study showed participants Google search results on all queries, but switched around branding elements at the top and bottom of the page to label the results as from Yahoo, Microsoft Live Search, a startup called AI²RS, and Google.

From the paper:

Based on average relevance ratings, there was a 25% difference between the most highly rated search engine and the lowest, even though search engine results were identical in both content and presentation.

The 25% difference was between the results branded with the AI²RS startup and the results branded as Yahoo.

Curiously, Yahoo was rated substantially higher than Google, despite the fact that these were Google's search results. Yahoo has failed to gain web search market share, but, if you believe this study, brand weakness is not the reason why.

It is true that this study is small, just 32 participants and across 4 different queries. It would be nice to see a broader study that confirms these results.

Even so, it probably is safe to say that the strength of the Google and Yahoo brands (and Microsoft's ownership of the defaults in Internet Explorer) make it very difficult for any web search startup. As Rich Skrenta once said, "A conventional attack against Google's search product will fail ... A copy of their product with your brand has no pull."

See also a lighter Penn State Live article on the study, "Branding matters -- even when searching".

[Found via Barry Schwartz]

Missing an opportunity in shopping metasearch

There is an interesting tidbit at the beginning of a Search Engine Land post, "Kicking The Tires On Shopping Search, Part Two: The Independents".

Compiling data from Hitwise, they shows that Google, Microsoft, and Yahoo now have a meager 15% market share in shopping search combined, down from nearly 50% combined three years ago.

Considering that a substantial percentage of web search queries are shopping-related (see "A taxonomy of web search" (PDF)) and the ease of extracting advertising revenue and revenue sharing where there is such strong purchase intent, I would think that Google, Yahoo, and Microsoft would be pursuing shopping metasearch more aggressively.

See also part one of the Search Engine Land series, "Analyzing the Major Shopping Search Services", which focuses mostly on design and usability of the shopping sites offered by Google, Yahoo, and Microsoft.

See also my earlier posts, "R.I.P. Froogle?" and "What should Google do next?".

Saturday, September 01, 2007

HITS, PageRank, and keeping it simple

A SIGIR 2007 paper out of Microsoft Research, "HITS on the Web: How does it Compare?" by Marc Najork, Hugo Zaragoza, and Michael Taylor is a large-scale study of several ranking algorithms using a substantial web crawl and data from the MSN query logs.

The authors appear to have expected the HITS algorithm to outperform the others in their tests, but found instead that a combination of BM25F and simple in-degree link analysis outperformed everything else. From the paper:

We were quite surprised to find that HITS, a query-dependent feature, is about as effective as web page in-degree, the most simpleminded query-independent link-based feature.

As expected, BM25F outperforms all link-based features by a large margin. The link-based features are divided into two groups, with a noticeable performance drop between the groups. The better-performing group consists of the features that are based on the number and/or quality of incoming links (in-degree, PageRank, and HITS authority scores); and the worse-performing group consists of the features that are based on the number and/or quality of outgoing links (outdegree and HITS hub scores).

The combination of BM25F with ... id in-degree consistently outperforms the combination of BM25F with PageRank or HITS authority scores, and can be computed much easier and faster.

PageRank performed poorly in their tests. However, their explanation of why struck me as unconvincing. From the paper:

The fact that in-degree features outperform PageRank under all measures is quite surprising. A possible explanation is that link-spammers have been targeting the published PageRank algorithm for many years, and that this has led to anomalies in the web graph that affect PageRank.

This begs the question of whether they picked the right PageRank algorithm. In particular, there are variants of PageRank that they could have used that appear less sensitive to spam and may have performed much better. Unfortunately, without results for those variants, it is hard to know whether the criticisms in this paper of naive PageRank are applicable to the algorithms evolved from PageRank used by search engines today.

Even so, the results of the study are interesting, both for the overview of several relevance ranking algorithms and the conclusions about their effectiveness. Particularly intriguing is the evidence that computationally expensive algorithms such as the query-dependent HITS algorithm seem to hold no advantage over much simpler techniques.

Update: Marc Najork, one of the authors of the paper, expands on the PageRank algorithm issue and the performance of HITS in the comments for this post.

Thursday, August 23, 2007

Marissa Mayer at SES 2007

Elinor Mills at CNet summarizes Googler Marissa Mayer's keynote speech at SES 2007.

Some excerpts from the article, focusing on personalization:

For general Web search, personalization is the future, Mayer said.

Ten to 15 years from now search sites will understand more about searchers, where they are located and what their personal preferences are, she predicted.

Mayer said one of the most important data points for improving search relevance based on personalization is the previous query, although Web history and address books could also be helpful "signals" to the search engine.

It is important that the ads are personalized too, she said ... "My philosophy is that the ads and the search results should match ... For me, search and ads are almost the same."

See also detailed notes on Marissa's keynote from Tamar Weinberg.

For more on personalized search, you might be interested in at least a couple of my posts on that topic, including "Personalized Search Primer" and "Effectiveness of personalized search".

For more on personalized advertising, please also see my posts "What to advertise when there is no commercial intent?" and "Is personalized advertising evil?"

Wednesday, August 22, 2007

Collective search versus personalization

Eric Auchard at Reuters reports on Ask CEO Jim Lanzone's keynote at the SES 2007 conference.

Jim argued that personalization doesn't work and then tried to contrast it with something he called "collective search". Some excerpts:

Ask.com ... aims to tap the collective search habits of its 50 million users to improve the relevancy of Web search.

[Jim Lanzone] said attempts at automated personalization often fail in practice to give users what they want.

Instead, Web search can be improved by understanding the aggregate behavior of different types of users.

This collective approach means users stand to benefit from what users with similar interests have gleaned from previous searches.

"Collective search is something that Ask really believes in," Lanzone said, adding that personalizing what different users see is only a small piece of further improving search.

If I could quote from my favorite movie, "You keep using that word. I do not think it means what you think it means."

I admit the term personalization may be poorly defined these days, but it is hard for me to see the distinction between personalized search and changing the search results based on "what users with similar interests have gleaned from previous searches."

In fact, I would think that is the very definition of personalization. Personalization changes what people see based on their past behavior and the past behavior of others.

Perhaps the distinction here is that collective search may change results even for people who have no history? For example, popular search results may get a higher ranking or search results that appear to be related after analyzing what people click on may be handled differently?

Yet even that often still is referred to as personalization. For example, one of Amazon.com's most successful and useful personalization features is similarities ("Customers who bought X also bought"). That feature is targeted to a specific page, not to a user's history, but is still personalization.

Am I missing something more here? Is Jim suggesting collective search includes something that personalization does not? For example, does collective search include explicit sharing of search results across a social network (like Yahoo's struggling MyWeb)? Or something else?

For more details on Jim's interview, see also Tamar Weinberg's notes over at Search Engine Roundtable.

Monday, August 20, 2007

Personalization session at SES 2007

Barry Schwartz at Search Engine Roundtable posts notes from the SES 2007 session on "Personalization, User Data & Search".

Let me highlight this tidbit on the results of an eye tracking study:

The personalized [search results] ... doubled the performance (click throughs).

That reminds me of what Googler Marissa Mayer once said:

[Personalization is] one of the biggest relevance advances in the past few years.

Personalization doesn't affect all results, but when it does it makes results dramatically better.

Update: More details on the eyetracking study.

Update: Even more details on the eyetracking study.

Friday, August 17, 2007

Interviews on search in 2010

Gord Hotchkiss at Search Engine Land compiled an impressive group of interviews for his article, "Search In The Year 2010". He talked to usability guru Jakob Nielsen, Googler Marissa Mayer, Larry Cornett from Yahoo, Justin Osmer from Microsoft, Michael Ferguson from Ask, and search industry experts Chris Sherman, Greg Sterling, and Danny Sullivan.

From the introduction:

It was with a great deal of anticipation that I threw in front of them the same question: what will the search results page look like in 2010?

Here, aggregated and condensed, are their answers.

Go read the whole thing, but, as usual, I am going to focus here on the part on personalized search.

Chris Sherman, Jakob Nielsen, and Greg Sterling are skeptical about personalization. Personalization "is incredibly hard to do" because "language is so inherently ambiguous" and "you have to guess", not to mention "the so called creep factor."

Danny Sullivan, Justin Osmer, Michael Ferguson, and Larry Cornett are more optimistic. "We're getting close to a tipping point on personalization" where little or no effort ("a very low investment") yields "a lot of return" because searchers will "get a lot more out of [the] search experience" if the search engine knows more about them. Searchers "clicks and their footsteps will walk to the experience that is most delightful and easy for them to use," though we should be careful not to "ask the users to do work." "Google is onto something with their personalized search results," and "people are misunderstanding how sophisticated it can be."

Oddly, Marissa Mayer did not say much on personalized search this time around. In the past, she has said ([1] [2] [3]) that "[personalization is] one of the biggest relevance advances in the past few years", "personalized search is something that holds a lot of promise", and personalization is key for building "the search engine of the future."

Similar to what Esther Dyson said in her interview with Charlie Rose, Chris Sherman agrees with Gord Hotchkiss that "Google is holding a significant portion ... of their personalization algorithm in reserve" because there is "caution" that they might "alienate the searcher." Chris goes on to say, "They probably have tons of stuff that they're not showing us."

On personalization being incredibly hard to do, please also see my March 2005 post, "Personalization is hard. So what?"

Esther Dyson on personalization and Google

Esther Dyson has an interview on the Charlie Rose show that covered several topics, including health, space travel, search, personalization, and social networking.

At 34:22, she talks about personalization, personalized search, and behavioral targeting. Some quotes:

The big issue for Google is going to be personalization.

You sort of see them dancing around this issue of, well, I could do much better search for you if I knew [more about you].

They are very, very concerned about the privacy issue. They're terrified that this is going to be a problem for them.

It's pretty clear Google wants someone else to go first with personalized search to get people to be more relaxed.

Esther goes on to say that she expects smaller companies to take the lead on personalized search and advertising. The implication appears to be that, once these startups nicely warm up the public on personalization, Google will launch all the personalization features they have been holding back, using their big data and big clusters to dominate the field.

The rest of her interview is interesting, especially if you have a strong interest in health, but also just for the tidbits on other technology companies. For example, at 33:38, after a discussion of social networking, she says, "Yahoo should have become Facebook."

See also my May 2007 post, "Esther Dyson on the future of search".

[Charlie Rose talk found via Adario Strange]

<< Back to glinden.blogspot.com