|
The killer search engine bug
December 1, 2008
You've heard of the killer app? The one that will make a fortune for everyone? I
have found what I think is the killer search engine bug. The one that is messing
everything up and will mess everything up worse for users, search engines,
website owners and Optimization consultants. It is Web page
authority,
which is also the basis of the algorithms. It is a logical and inherent bug that
can't be cured. What makes search engines good is what makes them bad.
The bug happens because of an inevitable conjunction of circumstances and facts
of life. The first one is the
passage of time, which is in itself innocuous. The second one is the expansion
of the Web, which must continue apace, and which is supposed to be a blessing.
The third is the set of assumptions that govern search engine algorithms and how
they choose which pages to show at the top of their listings.
Search engines, at least at present, cannot do much analysis of content quality.
They can tell that you have the phrase Search Engine Optimization in 10% of your
text and in your title and so on, while your competition has it only five times.
They can look at where you placed the phrases and what else is there.
But they really don't know if your page has smarter or better advice than
that of your competition just by looking at content on the page with the tools
that they have. They can't even
tell for certain that your page is more relevant. Your page may say "This page
is not about search engine optimization at all" 50 times in 50 different ways.
The only way a search engine can tell if a page is good or not good without
having actual people look at it is by counting the
links to that
page, and judging the authority
of the pages issuing the links and by looking at parameters like the size and
age of your Web site. The rationale behind counting links is that the number of
inbound links reflects the sum total of how people view that page. Therefore, it
is a substitute for having a crowd of people review and rate the Web page. Web links tend to accumulate over time. So authority
algorithms like Google Pagerank favor older sites. In addition, Google apparently gives extra
credit to older pages and older sites.
You can see the result if you search for an item that may change over time or
may be
mentioned many times in the news. The freshest information will not usually be
at the top of results that are returned. The information at the top will be the information that has
accumulated the most links. The Google News search fixes part of that problem.
But not always. Some items are not newsworthy, but that doesn't mean that there
can't be better information about the same subject that is newer.
For example, suppose I wrote the absolutely best article about [The Ottoman
Empire], but I put it at a fairly small Web site. Google claims it has two
million entries for that keyword phrase. How long will it take that page to
climb up over all the others, and beat the 2002 BBC article that is among the
first 10 pages shown for [The Ottoman Empire], not to mention the
Wikipedia page for The Ottoman Empire, which is inevitably number 1?
Don't hold your breath. If I am really lucky and the page is really good and I
have a fairly large Web site, it might take six months or more to get to the
first 10 listings returned for [The Ottoman Empire]. By that time, someone else
probably made a better Web page, but it too, is now buried somewhere, slowly
climbing up the list.
That is not the worst of it. Suppose that instead of only 2 million pages and an
old subject, there are 200 million pages for a keyword and its hot - new
pages are being created every day or every minute. If they link to anyone, they
will link to the top ranked pages that are related to them.
They aren't being dishonest. The system is working. Link authority is
supposed to work like a science citation index and it does. There are always a
few articles everyone links to, because they are classics in the field. But that
means that the top ranked authoritative sites will get more and more links and
get farther and farther ahead of the new comer pages.
And of course, since the information is hot, it's important to get it to the
top, because it makes the existing information obsolete. The top ranked
information is never the most recent.
And it gets worse. As the number of web pages and websites grows, the frequency
of search engine spidering of a particular page or site, or for a particular
topic, and the frequency with which results are re-ranked has to get slower
unless more and more resources are invested in spidering. More resources are
invested over time to keep up with the growth of the Web, but some topics are
hotter than others. It's easy to get to the top of the heap when there are only
300 pages for a keyword. It is almost impossible for a new page to get to the
top when there are already 5 million pages and the number is growing every day.
The site that had an article about Osama Bin Laden on the day before 9-11-01 had
a huge advantage over the new pages that accumulated rapidly after 9-11. The
first sites that discussed i-Pods have a huge advantage over those which came
later.
Search Engines have workarounds for this problem. In different areas, they can
play with the relative weights of recency and age in page placement, so that
medical and engineering breakthroughs get to the top of the pile. They could
(but don't) give you an option to sort Web pages by date, as Google does with
news. They add gizmos like
Onebox to inject
latest news into Web page. Google at least, also seems to periodically refine
its criteria and prune less relevant sites from its database.
But there is yet another problem. The bad commentary and the
poorly reported news tends to drive out the good, especially on the Web. Just as
most people are more avid readers of supermarket tabloids than they are of the
New York Times, they are more apt to link to pages with "simple, direct
messages" as marketers tell us, and to pages with crazy conspiracy theories or
wishful thinking, than they are to link to good content. Another article
that says, "Yeah
Google Pagerank is
important" is not going to get as much attention as one that heralds "the death
of Pagerank" (see
Search Ranking: Reports of Death Greatly Exaggerated).
Older pages get an advantage over newer ones, but within NEWER pages, those with
the most novel "news" have the advantage- "Osama is dead" will get more
attention and links and authority than, "Osama is still alive." The exploitation
tabloids couldn't stay in business with headlines like, "No two headed monsters
born this week."
Ami Isseroff
Notice: Copyright
All materials are copyright 2008 by Ami Isseroff. All rights reserved. These pages may not be reproduced in any
form in electronic or printed media without express written permission from the author. |