Google PageRankä -
PageRank is a numerical measure of the Authority
of a Web page, that is used to determine the
importance of links from that page or site and the Authority of the page. It is a log scale, ordinarily expressed
as an integer numeric value from 0 to 10 but the internal representation maintained by Google is a real number.
PageRank is a registered trademark of Google. It is is named after Larry Page who invented it at Stanford University along with
Sergei Brjin. The concept is patented (US Patent
6,285,999) and the patent is owned by Stanford university. Recently, patent experts
offered the opinion that that patent, and other software patents may be on shaky legal ground (see
here).
PageRank was originally described by Larry Page and Sergei Brin and associates in a 1998 article:
The PageRank Citation Ranking: Bringing Order to the Web
and was implemented in a practical prototype of the Google search engine, as described in
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
The theoretical version of Pagerank as
patented assigns values to pages based only in the number of links to them, the Pagerank of the originating pages
and the number of links from that page. The Pagerank authority conferred by a link from a page A to a page B is a
function of the Pagerank of page A, divided by the number of links on page A. This value is supposedly related to the
probability that a random - "ideal" Web surfer will eventually get to any given page in a network. As the number of pages on the
Web increases, the PageRank of individual pages that have not gotten more links decreases. The theoretical model has
been modified by a damping factor, which takes into account that a random surfer will eventually get tired of clicking
links and may therefore never get to some pages.
As Google implements PageRank in practice, it may be influenced by the number of links
to that page, the age of the page (older pages are better) size of the Web site, and perhaps other factors such as the
number of bad links. During the first months of
existence of a Web page, the page or site may be in the Google Sandbox.
This may mean, according to different accounts, either that the the page gets no PageRank (the Google toolbar shows a gray "no
information available" bar) or that new pages may get lower search engine positioning than they should get based on
the number of links to it and the PageRank.
In theory, according to the algorithm, any page in a Web site should
be able to get a higher PageRank than the main page if that page gets more links from more authoritative sources.
Therefore, for example, if an article in an obscure blog reveals the secret of synthesizing petroleum at the cost of $5
a barrel, many large sites and pages would link to it, and the article could get a high PageRank, though the main page
of the Web log does not. In practice this is rare. The main page of the site, which often gets links from all other
pages in the site, generally has the highest rank, especially as it is usually the target page in link exchanges. A few such cases do exist - for example.
This is counterintuitive, since the high ranking internal pages all link back to the main page and should be conferring
their PageRank on it.
Google PageRank Shortcomings
PageRank is an engineering approximation, not a perfect solution. PageRank logic has a number of built in
shortcomings, not all of which can be compensated by additional filtering and weighting:
Completeness: The ranking could only be complete after the entire Web had had been crawled by a
search spider to get a complete picture of the linking relations. Since the Web was, and is, expanding, new links,
called "backlinks," are created all the time. Thus, the ranking could the never be complete at any time, and would have
to be continually updated.
Relevance: The number of links to a page does not necessarily
reflect its relevance to a particular topic or its actual "quality."
Changing Web: Circa 1996, when planning of the PageRank algorithm had begun,
a very large proportion of the Web audience were also creators of Web pages that link to other Web pages, and therefore
a large proportion of the audience was also engaged in "voting" for Web pages. The Web was originally top heavy with
computer-science related materials that were created by experts, who were also fairly good judges of relevant materials
and linked to them, generally without any commercial, political or other bias. All this has changed rapidly over time.
In fact, by 1998 it had changed. As the discussion in this paper and in the companion paper about the Google search
engine prototype (
The Anatomy of a Large-Scale Hypertextual Web Search Engine)
shows, "junk" Web sites, pornographic and political sites were already well entrenched. Web pages were not
being created only by human "authorities," New technology soon made it possible for virtually anyone to create a Web page and
to link to whatever caught their fancy.
Age before beauty - A new page obviously has had less time to accumulate
links than an old one, and a new Web site is going to be smaller than an old one, though the new site might be the
writings of a professor, and the old one might have been created by a precocious teenager. Old and big is not
necessarily better than new and little.
Nature of the page - An authoritative page may link to no other pages - it
would be a "link sink" as the authors describe, and yet it might have very high authority. For example a reprint
of the article announcing the discovery of DNA structure by Watson and Crick would be the authoritative article on that
subject, just as the translation of the original article by Albert Einstein describing the special theory of relativity
would be an authority on its subject, but they may have no links to any other pages except possibly the home page of the
Web site. A page at an educational Web site about Watson and Crick or Einstein might have numerous mutual links, but the
material might be suitable only for children.
Distribution of authority - The algorithm assumes that authority is a fixed
quantity to be distributed according to the number of links originating from a page. In the very long run, this might be
true. A useful page that has many outgoing links might also accumulate many incoming links, so the loss of authority for
each outgoing link would be compensated by the gain in authority by increasing incoming links. Or not. Consider two
dictionaries of computer science. One is a 10 entry dictionary in a kiddy encyclopedia of computing that has
monosyllabic explanations and big pictures of Computer, Disc, Monitor, Mouse. The other is a 50,000 entry dictionary
maintained by an obscure university in India. The kiddy dictionary is used in educational programs throughout the United
States and every school links to it as a wonderful educational resource for preschoolers and first graders. The
university dictionary is excellent, but it is only linked to by three or four other universities and experts interested
in the more abstruse aspects of memory technology, Document Object Model and similar topics.
Ideal versus real surfer - PageRank tries to approximate the
behavior of a random surfer supposedly clicking links at random. The relation of PageRank to actual behavior of Web surfers has not
really been tested empirically. The "ideal" random surfer probably does not exist.
Authority versus random surfing - PageRank is at one and the same time supposed to
approximate the behavior of a random surfer and also approximate a citation index for academic articles. This idea is
inherently contradictory in one sense. An expert searching a paper citation index would not follow the "links" to
articles in a random manner. A citation in Science or an IEEE journal is going to attract the attention of an expert
much more frequently than a citation in "Biology for Dummies" or "Komputers for Kids." The "Biology for
Dummies" journal, as a Web page, is far more likely to get links on the Web than an obscure scientific journal article.
That article might really be the one that most Web surfers are looking for. However, the PageRank "popularity content"
creates the danger that as more people get their information from simplistic "popular" publications, the bad information
will drive out the good, and there will be more and more surfers who are only capable of understanding "Biology for
Dummies." Moreover, a scientific article that lists many references usually has more authority because it lists
more references. This is not taken into account directly in the PageRank algorithm, which simply divides authority,
measured by PageRank among the references.
Be all that as it may, Page and Brin were able to demonstrate that the algorithm, as they had tweaked
it, produced better results than any other search algorithm, and this judgment seems to have remained unchanged since
Google became a commercial product.
True Google PageRank versus Displayed PageRank
According to at least one source, Google maintains an internal PageRank on a
scale of 0 to 1, but reports PageRank as an integer number from 0 to 10, and supposedly, the reported number, the
"toolbar PageRank," is not an
updated reflection of the internal values used by Google.
Does PageRank Matter?
A number of sources claim that following recent updates, PageRank doesn't matter as much for positioning ("rank") of
your site for a given keyword. The rationale for this is supposedly that Google is trying to focus on "on page"
optimization factors (keyword density and the like) because they are less prone to "Google Bombing" by proliferation of
links.
This is a dubious assertion, because positioning for a
Keyword
by relevance to that word or phrase is theoretically a separate factor from the influence of PageRank on the position of
a page in search engine results. Consider for example, two identical pages that are optimized for keyword widget.
Page A is at a site that has a PageRank of 8, while Page B is at a Web site that has a pagerank of 3. Each page gets
1000 backlinks (links from other pages) from pages with a pagerank of 1. Which page is going to be number 1? Chances
are, that the page with the higher PageRank will be number 1. Moreover, if Page A and B are at Web sites with the same
PageRank precisely, suppose that Page A gets 1000 backlinks with keyword widget from pages with PageRank 1, but
Page B gets 1000 backlinks with keyword widget from sites with PageRank 8. All other things being equal, Page B
will have the higher PageRank and page B will also be retrieved in the higher position for keyword widget. See
also: Superstion:
Google Pagerank is no longer important (like "Calories
don't count," right?).
Ami Isseroff
See also -
Google PageRank
Note - Definitions of Search Engine
Optimization terms are based on inferences from common usage and definitions given by other sources. Conclusions about
search engine behavior are based on understanding of the behavior of the most popular search engines. Both are subject
to error or may change. Search engine company management may define or use a term or set or change any policy in any way
they see fit, and may make these definitions and specifications public or not. These decisions and definitions are
beyond our control. Notice: Copyright
All materials are copyright 2008, 2009 by Ami Isseroff. All rights reserved. These pages may not be reproduced in any
form in electronic or printed media without express written permission from the author.
SEO Glossary