|
Google Quality Rater Secrets
Ami Isseroff
Sept 27, 2008
Not long ago Brian Ussery discovered a confidential document describing
how Google quality raters are supposed to rate search engine results. The document itself has since been removed from
the Web. As there was probably a really good reason for that, I am not going to put it back on the Web, but here is what
is apparently a full version of this document in PDF format:
http://www.mauriziopetrone.com/blog/wp-content/uploads/quality-rater-guidelines-2007.pdf
From comments on this document it seems that many people may be mislead
into thinking that these criteria are actually used by the Google search engine to rank all Web sites. In particular, bloggers seem to have believed that they could figure out from the document precisely
how the Google recognizes SPAM Web sites.
There are two problems with the idea. Firstly, the document consists of
instructions to humans as to how to rate the Google results. It is not an algorithm or set of algorithms used by the search engine. Google is
evidently
giving "grades" to its search engine results for the most part, not to your Web sites. There is no
statement in the document as to how the rating results will be used.
rom its inception as described in
The Anatomy of a Large-Scale Hypertextual Web Search Engine
and The PageRank Citation Ranking: Bringing Order to the Web,
the Google search engine has been a "work in progress" that had the goal of improving the quality of search engine
results for users. The instructions and the raters are a way to check the quality of the search engine results, but that
doesn't mean that the criteria are necessarily incorporated in this or any other version of the search engine.
The second reason that this document won't tell you how to evade SPAM filters is that
spam filters are automatic, whereas spam ratings by raters are done using a set of criteria. It is likely that if a
rater happens on a page that is spam or doesn't load or is flagged as pornography, then that particular page will be
downgraded or penalized. However, it is not likely that raters will get to every single page on every Web site, or even
a large sample of them, or that Google applies the manual ratings of quality, which are separate from SPAM ratings and
flags to the actual results.
There are separate categories of ratings for quality of result, which relate to what
result was produced for a specific search, for SPAM, for pages flagged for pornography or malware and for pages that
cannot be loaded. All of the latter relate to the page itself, rather than the quality of result.
Remember, that the search quality ratings are for the quality of the result relative to
a specific query. A page about George Bush may be a perfectly good page, but if it is returned as the result of a query
for "art lovers" or "restaurants" then the result is inappropriate, though the page in itself is perfectly good.
However, in the best of all possible worlds, a page that is
webspam or spam or malicious would probably get a rating of Not Relevant. Nobody is searching to get a Trojan
horse installed in their computer and nobody is searching for spam. Of course, if you are searching for [Brittney Spears
Nude] then all the results returned should be pornographic. That is a good search result for that query.
It is nonetheless interesting to see how Google judges its own results, because it
tells us what they are trying to do, or what they think they are trying to do.
The raters look at actual user queries from all over the world and at how the search
engine responded to those queries. A very interesting aspect of Google's approach is the attempt to understand exactly
what the surfer really wanted to see, and to compare it to the results that the search generated. A problem that runs
through Google's handbook is that criteria are often somewhat circular. Raters are urged to research a topic by
searching for results about the topic on the Web, without consulting outside (non-Web) sources necessarily. A page may
be judged to be a good page if it links to "authoritative" pages, but authoritative pages are judged to be authoritative
based on search engine results and not based on any knowledge of the subject. The search engine results, in turn, are
based on how many pages link to a particular page, making it authoritative (See
Web Site Authority and
Google PageRank). This is a mechanism for
reinforcing and popularizing "conventional wisdom." There is no attempt to objectively measure the quality or
reliability of information on a page beyond volume. If there are more words and they are words relevant to the
query, than the page is good. Thus, a page that has 40,000 words about gravity based on Aristotelian theory, with links
to the works of Aristotle at highly ranked pages is "better" than a page that explains Newtonian and Einsteinian
gravitational theories in 500 words with a few equations and no references.
Ranking Criteria - The raters are presented with material from a database
that shows the query, the location from which the query originated and one URL that was retrieved by the search engine.
We are not told in the document if it was the top URL retrieved or if Google evaluates more than one URL for a given
query to see if its ranking system is valid.
The following are some of the basic criteria in the document (based in part on
The Google Quality Raters Handbook
Types of Queries:
Google classifies queries as:
Informational - Searching for information, such as "Magna Carta"
Transactional - User wants to buy something (eg: at ebay or Amazon).
Navigational - User is looking for a specific Web site URL.
It can't possibly be clear, in every case, whether a query is "transactional" or "informational." A
search for "War and Peace" could be looking for a summary of the novel or looking to buy the novel or to read it online.
But Google raters are supposed to understand what the surfer wanted.
Broad vs Specific Google raters also classify searches as broad or specific, though no criteria
are given for these categories.
The handbook gives these examples:
digital camera - Looking to purchase a digital camera.
Canon SD 550 - Looking to purchase this specific camera
Of course, the surfer may not be looking to purchase anything, but rather searching for information
and specifications, sales figures or other data. But Google's assumptions are evidently based on the intentions of the
majority of users, or what they think characterizes the majority of users.
Search Quality Intangibles
In general, results should match the expectations of the surfer and the type of
query, as interpreted by the raters. A broad search should return a broad result and
a narrow search should return a narrow result.
Timeliness - Search results are time dependent. A query about
George Bush in 1991 should have returned information about George H.W. Bush (the father) while a query in 2008 should
have returned more information about the son.
Location - Results should be relevant to the location of the user. If a user
searches for "football" in the USA, they are looking for information about the game played with the oblong ball
and the quarterbacks etc. The same search in the United Kingdom should generate information about the game Americans
call "soccer."
Amount of Information Available - If a lot of information is available about
a topic, Google tells its raters, then a page that has just one link and little text should not rank highly. That tells
us that in principle at least, larger pages are intended to get higher rankings, and there is no such thing as "optimum
page size." (see
Optimum Page Size Superstition)
Google does not try to rate pages as to correctness or reliability of information. That is peculiar,
because it means that it never checks whether pages or sites that are supposed to have the highest
Authority according to its
Google PageRank algorithm are indeed providing correct
and authoritative information. The highest ranking page about the Canon camera could be a totally incorrect advertising
blob or a hatchet job done by competition. The top page retrieved for George Bush might be an encomium written by
election propagandists or a hatchet job done by the opposition. The top positioned page for keyword Jew might be
(and often is) a racist screed composed of paranoid anti-Semitic inventions.
Google Quality Rating Scale
Google uses a 5 level quality rating scale for ratable search results:
Vital: (1.5) A score that is reserved only for navigational queries where there is a clear dominant Web page.
It is misleading to say that "vital is the highest score," because it doesn't apply to queries that are not
navigational. A Vital rating is given if the user searched for the name of a firm or a person, and the query returned
the Web site of that firm or person. For example, the page returned is the official Web page of a firm or entity
that was the subject of the query. When searching for 'ibm', the vital result would be
www.ibm.com. But Google makes unwarranted assumptions about what the user wants.
Suppose the user is looking to buy storm windows, and types [windows] in Google? They will get the home page of
Microsoft Windows as their top result. For [apple]they will not get a page about fruit, but rather, the home page of
Apple computers. Google queries are not case sensitive.
Useful: (1.5) A useful rating should be assigned to results that "answer the query just right; they are neither
too broad nor too specific." For example: A search for meningitis that returns:
http://www.webmd.com/hw/infection/aa34586.asp
For informational queries, this is the highest possible rating.
Relevant: (3) The results are often "less comprehensive, come from a less authoritative source, or cover only
one important aspect of the query." For example, a review of laptop computers that only discusses five computers and not
all computers within its class. Note that "less authoritative source" can be judged in many ways. If the criterion is
Web page attributes such as number of links, Google is using its own algorithm to validate its algorithm.
Relevant pages, according to the handbook, include a page with a brief article on the topic of the query or a less important subpage
on the correct site. If a query “asks” for a list, according to the guidelines, then a single item is Relevant.
For example, if the query is [ fudge recipes ], a single fudge recipe is Relevant. Thus Google, but that may be
a matter of opinion, depending on what the user intended and the quality of the page.
A rating of Relevant is also used for a homepage that would have been Vital if there had not been a
more dominant interpretation for the query.
Not Relevant: (4) Pages that are not helpful to the query but are somewhat still connected to the original
query. Classifications of a not relevant page would be "outdated, too narrowly regional, too specific, too broad" etc.
One of the examples given is a search for the 'BBC' that returns a specific article from BBC; it is too specific and is
not relevant to the query at hand.
A rating of Not Relevant is also assigned to a page if it has a link to good results on the same site or
another site, but is
not a good result itself. It may be an unimportant or useless sub-page on the correct site or it may be only a link page.
Off-Topic: (5) This is the lowest rating a page can receive for a query. If the returned page is completely not
relevant to the query, it would be given a rating of "off topic." An example given is a query on 'hot dogs' that returns
a page about doghouses.
According to the handbook, A rating of Off-Topic also applies when the result ignores an important modifier or element of the query. For example,
for the query [ universities in India ], An article about universities in Europe is Off-Topic. But this is
a frequent fault of results returned by Google for complex queries.
Worse than bad ratings
Pages that cannot be rated or are spam or undesirable content do not fall in the above scale at all. They have
separate scales.
Results That Can't Be Rated:
Didn’t Load: For pages that return a 404 error, page not found, product not found, server time out, 403
forbidden, login required, and so on.
Foreign Language: This is given to a page that is in a "foreign language" to the "target language" of the
query. English is never a foreign language.
Unratable: When the rater cannot rate it for any other reason.
Flagged pages:
Flags are for pages that require immediate attention. Google lists only two flags:
Pornographic content
Malicious code on pages
Again in theory, a page that is ranked "vital" might have pornographic content. There are no flags for
racist content or dangerous pages that contain instructions on how to commit the perfect crime, complete instructions on
how to build an atomic bomb and where to get the materials etc. It is much more probable that flags are applied and
result in general penalizing or removal of a page, because they relate to the entire page, rather than to results for a
specific query. Malicious code flags evidently result in a warning that "This page may be dangerous to your computer."
Actual Positioning and Search quality rating
As far as we can tell, raters are never told what the positioning of the query result was. They are shown the query
and one result. They are not told if the user in question actually clicked on that result or not. In fact, we have no
idea how Google's system decides to select a particular query result for rating, or who the results of the rating may be
used to change the search algorithm.
Influence of Ratings on Search Results
If Google indeed uses these quality ratings to influence their results, then they are violating their own declared
code. Google queries often list racist or obnoxious content, as for example, for keyword "Jew." Google puts a notice
with these results that explains that they are also disturbed by the results, but they cannot suppress them because that
would disturb their algorithm. Since Google can and does suppress pages flagged as spam or malware, their notice is
counterfactual evidently.
Spam
There is a large section on SPAM that appears in some of the versions of the handbook on the Web, while only an
abbreviated version seems to be shown in other versions. The labels for SPAM are:
Not Spam: The not spam rating is given to a page that "has not been designed using deceitful web design
techniques."
Maybe Spam: This label is given when you feel the page is "spammy," but you are not 100% convinced of that.
Spam: Given to pages you feel are violating Google's webmaster guidelines.
Again, the SPAM ratings that they are evidently orthogonal to the quality ratings. One would
think that any page that gets a high result quality rating had better not be spam, but evidently that is not the case. In theory, a
page ranked "vital" could be labeled Spam!
Google warns that it is better to err on the side of leniency and not to label a page as spam, and it also explains
that there is a mechanism for adjudicating differences of opinion between raters. This suggests strongly that SPAM
ratings and similar flags are really applied to results if they are found to apply to a page.
Google Criteria for SPAM
Google recognizes the following sorts of SPAM sites for manual rating purposes. Remember that these are not
necessarily detected by automatic filters. In general, these type of pages implement various techniques of
Black Hat SEO:
PPC - Pay per click - the page is set up so that it is all or mostly Pay per click advertisements, or consists
of "scraped content" (content taken, usually automatically, from other Web sites) plus PPC advertisements. Google warns
specifically, for example, about pages that simply copy Wikipedia and add advertisements. These are considered spam.
"The important thing to remember is that if the scraped (copied) content on the page is removed and all that remains is
ads, it is Spam."
Parked Domains - An expired domain is purchased by a spammer and filled with irrelevant junk
links. Since we can find many of these sites displayed prominently in search engine results, it is obvious that the
criterion is only applied if raters find the site.
Thin Affiliates - A thin affiliate is site that is set up only as a front to market products of
another e-business. You cannot purchase the products there. Some bloggers have noted that the criteria used by Google
are too strict and may screen out real merchants, because the criteria include points like "a way to track FedEx
orders," "a “wish list” link, or a link to postpone purchase of an item until later." Someone got carried away here.
Obviously, a real merchant just needs to provide a place to buy the product. A site that offers price comparisons or
other useful information is not considered a "thin affiliate."
Hidden Text and Hidden Links - Text and links are the same color as background, allowing
addition of keywords.
Java Script Redirects - a form of cloaking. The search engine spider sees one page that has
information, but the java script redirects the user to a different page that is spam.
Keyword Stuffing - This can be done manually or automatically. The idea is to load the page
artificially with
keywords to attract the search engine spider.
Keywords may or may not be related to the content of the page. Some pages are generated "on the fly" in response to
queries, so that in future, they will be there when that query is entered. Keywords can be stuffed in any part of the
page including the URL.
100% Frame - A frame page takes up 100% of the browser view, so that users see only that page,
but the spider sees the frame page and another page that is linked to it and blocked from view. The second page contains
real information, but the user does not see that.
Sneaky redirect - It is not clear how this is really different from Javascript redirect in
principle. In a sneaky redirect, we are told that the page redirects to one or more other domains on a rotational basis.
Google also allows for legitimate redirection. According to Google's directions, a redirect is not "sneaky" if both
domains have the same ownership. This cannot be correct if interpreted literally, since it would allow you to have, for
example, a page that is supposed to be about political issues, and redirects to another domain that you own, which
is a porno site.
Webspam that Google Misses
It seems from the above that there are a number of kinds of SPAM and Black Hat SEO
that Google misses. For example, there are schemes that allow automatic redirection from a legitimate page to a page
that is the Web site of an affiliated merchant. That sort of thing is not covered explicitly, but it is obviously SPAM.
Who will rate the raters?
There is no real mechanism, presumably for adjudicating differences of opinion about quality, and no
real mechanism for checking objectively if a page is spam. There is a resolution mechanism for spam issues, but it is
apparently based on mechanical criteria. For example, Google tells raters, they should check the page in different
browsers and check source code to find out why their rating differs from that of other raters.
A page about Islam for example can be rated as "not relevant" or SPAM by Islamophobic raters, and page
about Zionism that originates in Israel might be labeled as "not relevant" by raters in Saudi Arabia, or they might
simply categorize it as "didn't load" or "foreign language" or "unratable." Into the trash bin it goes! It is not
clear what effect, if any, these ratings will have on actual search results, or who checks these issues.
What it means for Web site owners
Remember what it all means. Insofar as the quality ratings are concerned, this is how Google checks its own results using human raters. It is not how the Google
algorithm works. Regarding the various flags and SPAM ratings, it is probably that they are applied to pages that the
raters find, but it is not likely that the raters will get to more than a few percent of the possible pages. Of course,
they get to rate only a tiny percent of possible queries.
Larger Issues
When a person searches for "apple" or "windows" they are liable to be directed to the Web site of
Apple or Microsoft computer company. Consider the effect that this has, and will have on the language and culture of the
world. Consider the power that we have signed away to large corporations to shape our culture. And, at the narrowest
level, consider the dilemma of the search engine raters. Suppose someone has a brand of computer called Sex. Should the
query for the popular keyword [Sex] show the home page of this company or should it show sites related to sex? Why is
that different from "apple"?
Ami Isseroff
Notice: Copyright
All materials are copyright 2008 by Ami Isseroff. All rights reserved. These pages may not be reproduced in any
form in electronic or printed media without express written permission from the author. |