Search Engine Optimization

Google PageRank


Google PageRank

 

Google PageRankä -  PageRank is a numerical measure of the Authority  of a Web page, that is used to determine the importance of links from that page or site and the Authority of the page.  It is a log scale, ordinarily expressed as an integer numeric value from 0 to 10 but the internal representation maintained by Google is a real number. 

PageRank is a registered trademark of Google. It is is named after Larry Page who invented it at Stanford University along with Sergei Brjin. The concept is patented (US Patent 6,285,999) and the patent is owned by Stanford university. Recently, patent experts offered the opinion that that patent, and other software patents may be on shaky legal ground (see  here).

PageRank was originally described by Larry Page and Sergei Brin and associates in a 1998 article: The PageRank Citation Ranking: Bringing Order to the Web and was implemented in a practical prototype of the Google search engine, as described in The Anatomy of a Large-Scale Hypertextual Web Search Engine.

The theoretical version of Pagerank as patented assigns values to pages based only in the number of links to them, the Pagerank of the originating pages and the number of links from that page.  The Pagerank authority conferred by a link from a page A to a page B is a function of the Pagerank of page A, divided by the number of links on page A. This value is supposedly related to the probability that a random - "ideal"  Web surfer will eventually get to any given page in a network.  As the number of pages on the Web increases, the PageRank of individual pages that have not gotten more links decreases. The theoretical model has been modified by a damping factor, which takes into account that a random surfer will eventually get tired of clicking links and may therefore never get to some pages.

As Google implements PageRank in practice, it may be influenced by the number of links to that page, the age of the page (older pages are better) size of the Web site, and perhaps other factors such as the number of bad links. During the first months of existence of a Web page, the page or site may be in the Google Sandbox. This may mean, according to different accounts, either that the the page gets no PageRank (the Google toolbar shows a gray "no information available" bar)  or that new pages may get lower search engine positioning than they should get based on the number of links to it and the PageRank.

In theory, according to the algorithm, any  page in a Web site should be able to get a higher PageRank than the main page if that page gets more links from more authoritative sources. Therefore, for example, if an article in an obscure blog reveals the secret of synthesizing petroleum at the cost of $5 a barrel, many large sites and pages would link to it, and the article could get a high PageRank, though the main page of the Web log does not. In practice this is rare. The main page of the site, which often gets links from all other pages in the site, generally has the highest rank, especially as it is usually the target page in link exchanges. A few such cases do exist - for example. This is counterintuitive, since the high ranking internal pages all link back to the main page and should be conferring their PageRank on it.

Google PageRank Shortcomings

PageRank is an engineering approximation, not a perfect solution. PageRank logic has a number of built in shortcomings, not all of which can be compensated by additional filtering and weighting:

Completeness: The ranking could only be complete after the entire Web had had been crawled by a search spider to get a complete picture of the linking relations. Since the Web was, and is, expanding, new links, called "backlinks," are created all the time. Thus, the ranking could the never be complete at any time, and would have to be continually updated.

Relevance: The number of links to a page does not necessarily reflect its relevance to a particular topic or its actual "quality."

Changing Web: Circa 1996, when planning of the PageRank algorithm had begun, a very large proportion of the Web audience were also creators of Web pages that link to other Web pages, and therefore a large proportion of the audience was also engaged in "voting" for Web pages. The Web was originally top heavy with computer-science related materials that were created by experts, who were also fairly good judges of relevant materials and linked to them, generally without any commercial, political or other bias. All this has changed rapidly over time. In fact, by 1998 it had changed. As the discussion in this paper and in the companion paper about the Google search engine prototype ( The Anatomy of a Large-Scale Hypertextual Web Search Engine) shows, "junk" Web sites, pornographic and political sites were already well entrenched. Web pages were not being created only by human "authorities," New technology soon made it possible for virtually anyone to create a Web page and to link to whatever caught their fancy.

Age before beauty - A new page obviously has had less time to accumulate links than an old one, and a new Web site is going to be smaller than an old one, though the new site might be the writings of a professor, and the old one might have been created by a precocious teenager. Old and big is not necessarily better than new and little.

Nature of the page - An authoritative page may link to no other pages - it would be a "link sink" as the authors describe, and yet it might have very high authority.  For example a reprint of the article announcing the discovery of DNA structure by Watson and Crick would be the authoritative article on that subject, just as the translation of the original article by Albert Einstein describing the special theory of relativity would be an authority on its subject, but they may have no links to any other pages except possibly the home page of the Web site. A page at an educational Web site about Watson and Crick or Einstein might have numerous mutual links, but the material might be suitable only for children. 

Distribution of authority - The algorithm assumes that authority is a fixed quantity to be distributed according to the number of links originating from a page. In the very long run, this might be true. A useful page that has many outgoing links might also accumulate many incoming links, so the loss of authority for each outgoing link would be compensated by the gain in authority by increasing incoming links. Or not. Consider two dictionaries of computer science. One is a 10 entry dictionary in a kiddy encyclopedia of computing that has monosyllabic explanations and big pictures of Computer, Disc, Monitor, Mouse. The other is a 50,000 entry dictionary maintained by an obscure university in India. The kiddy dictionary is used in educational programs throughout the United States and every school links to it as a wonderful educational resource for preschoolers and first graders. The university dictionary is excellent, but it is only linked to by three or four other universities and experts interested in the more abstruse aspects of memory technology, Document Object Model and similar topics.

Ideal versus real surfer - PageRank tries to approximate the behavior of a random surfer supposedly clicking links at random. The relation of PageRank to actual behavior of Web surfers has not really been tested empirically. The "ideal" random surfer probably does not exist.

Authority versus random surfing - PageRank is at one and the same time supposed to approximate the behavior of a random surfer and also approximate a citation index for academic articles. This idea is inherently contradictory in one sense. An expert searching a paper citation index would not follow the "links" to articles in a random manner. A citation in Science or an IEEE journal is going to attract the attention of an expert much  more frequently than a citation in "Biology for Dummies" or "Komputers for Kids."  The "Biology for Dummies" journal, as a Web page, is far more likely to get links on the Web than an obscure scientific journal article. That article might really be the one that most Web surfers are looking for. However, the PageRank "popularity content" creates the danger that as more people get their information from simplistic "popular" publications, the bad information will drive out the good, and there will be more and more surfers who are only capable of understanding "Biology for Dummies." Moreover, a scientific article that lists many references usually has more authority because it lists more references. This is not taken into account directly in the PageRank algorithm, which simply divides authority, measured by PageRank among the references.

Be all that as it may, Page and Brin were able to demonstrate that the algorithm, as they had tweaked it, produced better results than any other search algorithm, and this judgment seems to have remained unchanged since Google became a commercial product.

True Google PageRank versus Displayed PageRank 

According to at least one source, Google maintains an internal PageRank on a scale of 0 to 1, but reports PageRank as an integer number from 0 to 10, and supposedly, the reported number, the "toolbar PageRank,"  is not an updated reflection of the internal values used by Google. 

Does PageRank Matter?

A number of sources claim that following recent updates, PageRank doesn't matter as much for positioning ("rank") of your site for a given keyword. The rationale for this is supposedly that Google is trying to focus on "on page" optimization factors (keyword density and the like) because they are less prone to "Google Bombing" by proliferation of links.

This is a dubious assertion, because positioning for a Keyword by relevance to that word or phrase is theoretically a separate factor from the influence of PageRank on the position of a page in search engine results. Consider for example, two identical pages that are optimized for keyword widget. Page A is at a site that has a PageRank of 8, while Page B is at a Web site that has a pagerank of 3. Each page gets 1000 backlinks (links from other pages) from pages with a pagerank of 1. Which page is going to be number 1? Chances are, that the page with the higher PageRank will be number 1. Moreover, if Page A and B are at Web sites with the same PageRank precisely, suppose that Page A gets 1000 backlinks with keyword widget from pages with PageRank 1, but Page B gets 1000 backlinks with keyword widget from sites with PageRank 8. All other things being equal, Page B will have the higher PageRank and page B will also be retrieved in the higher position for keyword widget.  See also: Superstion:  Google Pagerank is no longer important (like "Calories don't count," right?).  

Ami Isseroff

See also - Google PageRank

 

Note - Definitions of Search Engine Optimization terms are based on inferences from common usage and definitions given by other sources. Conclusions about search engine behavior are based on understanding of the behavior of the most popular search engines. Both are subject to error or may change. Search engine company management may define or use a term or set or change any policy in any way they see fit, and may make these definitions and specifications public or not. These decisions and definitions are beyond our control.  

Notice: Copyright

All materials are copyright 2008, 2009 by Ami Isseroff. All rights reserved. These pages may not be reproduced in any form in electronic or printed media without express written permission from the author.

SEO Glossary

SEO

SEO Basics

The SEO Book

SEO Articles

SEO Blog

More Links

MidEastWeb: Middle East

Zionism

SEO - Web Site Search Engine Optimization Contact: Webmaster(at)Yu-hu.com
site map
Google Pagerank