SearchMob!
Recent Comment
Spotlight
- Reader Hercule DB writes: Each of us has a choice to make. How much privacy do we demand? What price freedom? We should rather live in a free world troubled even by threats from terrorists, than one in which individuals or organizations in whom I have little trust have open access and therefore control over our lives. [go]
Recent Comments
- ExposureTim: " I love it - thanks for posting this. It ..." [go]
- Mike: " ASK won't be doing it. They are signed ..." [go]
- eka-man: " Pfft ?!! doing the energy saving platfor ..." [go]
- epictum: " welcome to the google universe. next up. ..." [go]
- Trogdor: " Nice to see this available. Ask has had ..." [go]
- Soren G: " I think Alanis nailed this. It is that g ..." [go]
- John: " And really, that song deserves some mock ..." [go]
- JG: " First of all, as I repetitively say, why ..." [go]
- meeero: " a high pagerank doesn't mean that the si ..." [go]
- Geo: " I'm glad to see that Google has finally ..." [go]
- Rich: " Hi John, I'll chime in too. We're on o ..." [go]
- franck: " you're not a paranoid freak, i also did ..." [go]
- David Megginson: " I'm not a big fan of litigation usually ..." [go]
- Güzel Resimler: " i could not understant what is about thi ..." [go]
- peggy wilburn: " Hello, Looking for the paternal side or ..." [go]
- mb: " John, I'm with you on the fear that big ..." [go]
PERFECT FOR THAT PERSON WITH EVERYTHING
Order 'The Search'
Yup, it makes the perfect gift for that officemate or colleague who you thought had everything....including you! If you order here, I promise to sign it, assuming we can figure out the shipping...
You can also buy the audio version here.
Check my book page for more info.
Blogger's Rights
Top Posts
- The Database of Intentions (or how this all got started)
- From Pull to Point(or the first post where I riff on the "Point-To Economy")
- Google As Builder (or the point at which Google stopped being simply a search engine)
- On Google v. Yahoo
- TV and Search Merge
- On Sell Side Advertising
- Battelle Gets Searchstreams
- Search and Immortality
- Toward the Endemic (on endemic advertising)
More coming soon...
Active Topics
- 25 comments: Conversational Marketing: PGM v. CM, Part 3 (03.09)
- 24 comments: A Modest Proposal To YHOO and MSFT: Spin Out A Search Company (03.13)
- 21 comments: Microsoft Deal For Large Customers: Use Live Search, Get Free MSFT Products (03.15)
- 12 comments: Ballmer On Google - Uh Oh (03.18)
- 10 comments: BizWeek Says It All (03.31)
Monthly Archives
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
- December 2003
- November 2003
- October 2003
About John Battelle
Searchblog Newsletter
Enter email to subscribe to "Re-Find", Searchblog's weekly newsletter:
Calendar
| Su | Mo | Tu | We | Th | Fr | Sa |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 |
Syndicate
Powered by
The Search Papers Archive
December 16, 2004
Search Paper Fun: Most Cited
I sent a query to Lee Giles, the guru at Penn State behind CiteSeer (with Steve Lawrence, who is now at Google) asking him which search-related papers are the most cited. I was struck by the near parity between Page and Brin's original paper on Google and Jon Kleinberg's paper on Hubs and Authorities. Giles did a bit of fiddling with Google Scholar and responded:
For web related work these are well cited in the Google Scholar using the query “web”:
PDF] The Semantic Web
T Berners-Lee, J Hendler, O Lassila - View as HTML - Cited by 1347
... May 17, 2001. The Semantic Web. A new form of Web content that is meaningful to
computers will unleash a revolution of new possibilities. ... Web: A Research Agenda. ...
Scientific American, 2001 - www-personal.si.umich.edu
[PDF] The anatomy of a large-scale hypertextual Web search engine
S Brin, L Page - View as HTML - Cited by 1087
Abstract In this paper, we present Google, a prototype of a large-scale search
engine which makes heavy use of the structure present in hypertext. Google ...
Computer Networks and ISDN Systems, 1998 - kulturinformatik.uni-lueneburg.de - firstrate.co.nz - net.cs.pku.edu.cn - scalab.uc3m.es - all 69 versions
However, this one can’t be ignored:
[PDF] Authoritative sources in a hyperlinked environment
J Kleinberg… - Cited by 1059
Abstract. The network structure of a hyperlinked environment can be a rich
source of information about the content of the environment, provided we ...
Journal of the ACM, 1999 - portal.acm.org - nan.dhs.org - cs.cmu.edu - mathe.tu-freiberg.de - all 73 versions
This book is the first to discuss the web in any detail:
[PS] Modern Information Retrieval
R Baeza-Yates, B Ribeiro-Neto, R Baeza-Yates - View as HTML - Cited by 1198
Page 1. Modern Information Retrieval. Ricardo Baeza-Yates. Berthier Ribeiro-Neto.
ACM Press New York. ... 1.1.2 Information Retrieval at the Center of the Stage . . ...
Addision Wesley, 1999 - dcc.ufmg.br - sunsite.dcc.uchile.cl - sims.berkeley.edu - portal.acm.org - all 7 versions »
All worthy reads!
- Posted by John Battelle at 5:29 PM
- Permalink
- Comments (0)
November 18, 2004
Google Scholar Launches: A Hint of Things to Come?
Google has, for some time, had a few verticalized, niche search solutions hidden in their Advanced Search areas, notably their "topic specific" search around Linux, the Mac, govt sites, and the like. Today the company launched another, more ambitious vertical search tool called Google Scholar. According to folks I spoke to last night at Google, the service was done by one engineer in his "20% time." Anurag Acharya, the engineer behind the service, tuned Google's crawler for academic papers and worked with universities to make those papers available to others on the web.
The services has the tagline "Stand on the shoulders of giants." It includes a cross referenced citation link for each paper, which is very cool, and as we all know, the basis of PageRank (and the WWW) in the first place. Here's a search for vertical or domain specific search, for example.
This move marks a trend toward making usually invisible (and useful) information more accessible, one that I could imagine spreads to other domains, perhaps ones more commercial in nature. (Scholar does not have ads in it, at least for now). The special ranking algorithm and policies for dealing with the nature of a structured document universe such as this clearly scales to other opportunities - ie, travel, automotive, business information and the like.
Here's Resourceshelf's take on this, and SEW's.
- Posted by John Battelle at 5:24 AM
- Permalink
- Comments (8)
- TrackBacks (2)
March 25, 2004
Upcoming WWW Conference: Loads O Search
Resourceshelf has culled the upcoming WWW conference for selected references to search. There's also a whole track on the Semantic Web.
The complete list is a Who's Who of search stars and a telling map of who's doing interesting research in the area. Included: Intel, University of Washington, IBM, Yahoo (Understanding User Goals in Search), National University of Singapore, MIT, Microsoft. A9's Udi Manber (who I did meet with, but can't go into our talk quite yet) is giving a keynote.
OK, I think I have to go to this.
- Posted by John Battelle at 9:00 AM
- Permalink
- Comments (0)
January 11, 2004
The Search Papers: Do Web Search Engines Suppress Controversy?
The First Monday peer-reviewed journal recently published "Do Web Search Engines Suppress Controversy?" by Susan Gerhart, a software engineering professor at Embry-Riddle Aeronautical University. Driving the paper is this sentiment:
"The dilemma of controversies is that the searcher beginning to explore a topic doesn’t know the search terms to investigate a controversy unless it is revealed with reasonable visibility, e.g. not item number 879 in search results, nor buried three links away from result number 30."
In other words, if you are just starting to research a topic, and have no idea if there are any controversies surrounding said topic, how will you ever know if the search engine has a bias toward not revealing those controversies?
This paper explores the hypothesis that, as Gerhart puts it: "A given, well–known specific controversy will not be revealed in the top search results." She then creates an experiment to test this hypothesis, by outlining both a broad topic, and a related controversial subtopic. An example is "Albert Einstein" as the broad topic, and "Did Einstein’s first wife, Mileva Maric, receive appropriate credit for scientific contributions to Einstein’s early work" as the subtopic. The question is, do search engines leave out the more controversial bits, the stuff that, taken as a whole, provide texture and context to any searcher's understanding of a topic?
For the many examples she tested, Gerhart found proof on both sides of the ledger, and the paper left me disappointed that she could not come to a more decisive conclusion. She did note that in fact most search engines were roughly equal in their performance in the experiments. And she has some interesting thoughts on how controversies are integrated (or not) into the web at large, and some suggestions as to how various actors on the web - site authors, researchers, search engines - might better organize themselves to portray a more relevant set of SERPs to any particular query.
All in all, I liked this paper, as it forced me to think about the politics and architecture of search engine results. She introduces the idea of "sunny" vs. "dark" search results, and concludes that "sunny" results - those that do not include controversies, tend to float toward the top. Her final conclusion:
"Web search engines do not conspire to suppress controversy, but their strategies do lead to organizationally dominated search results depriving searchers of a richer experience and, sometimes, of essential decision–making information. These experiments suggest that bias exists, in one form or another, on the Web and should, in turn, force thinking about content on the Web in a more controversial light."
The one thing Dr. Gerhart left out entirely is the effect of blogs. As most of us certainly know, when the blogosphere latches onto a controversy (or just a politically-driven meme), that aspect of a topic usually shoots to the top of the SERPs. As with most good papers, this one left me feeling like there is much work yet to be done.
- Posted by John Battelle at 10:42 AM
- Permalink
- Comments (0)
December 8, 2003
The Search Papers: Bray on Search
Tim Bray has a series called On Search over at his Ongoing blog, and I find it worthy of a read'n'muse. He starts with this backgrounder on himself and search issues as he sees them, and has a ton of entries on any number of subjects, too numerous to go into here. Highlights: he writes on interface issues (warning, not for the faint of geek), how best to search XML (answer: we don't know yet, recall he was a co-author of same), and on result rankings, with a quick refresher on why PageRank works, and good advice on paying attention to your own logs. Also worthy: his primer on how search works, and his discussion of the technical search terms precision and recall (with an interesting note on the absence of top companies in the research community - see my post on this here), and lastly (whew), his mini-rant on intelligent search, and why it's a long way off. An excerpt:
"If we want better search (and we do), we’d better not count on AI voodoo or linguistic juju or semantic mojo. We need to work with good sound statistical techniques, and be clever about generating and using metadata, and we need to get our APIs right. All of these things are hard, and there is good work being done in all of them."
- Posted by John Battelle at 12:59 PM
- Permalink
- Comments (1)
- TrackBacks (1)
Searchblog Classifieds!
Recent Jobs
Searchblog, in paperback
Searchblog
Print Edition
Get Your Own Print Version of Searchblog
Click here to buy a customized print version of the entire contents of Searchblog.


