Search Engine Optimization

Quality of Google Search Engine Results


Quality of Google Search Engine Results

 

The Quality of Google Search Results

Ami Isseroff

Sept 28, 2008

Yesterday I wrote about Google Quality Rater Secrets, discussing how Google uses humans to rate its search engine results. Improving the quality of search engine results was the rationale behind the original search engine and ranking algorithm described by the Google founders in The Anatomy of a Large-Scale Hypertextual Web Search Engine and The PageRank Citation Ranking: Bringing Order to the Web.

Ten years ago, search engines were poor. Results were often random and the never fit what users were really looking for, but rather, what Web site owners and search engine companies wanted them to see. Google made a giant improvement in that situation. At the same time, it was helped by the growth of the Web, which, along with a lot of junk, has also generated a large amount of high quality information as well as commercial sites where you can buy just about anything.

I decided to test the Google search engine against its own criteria, which its quality raters use to judge results. While it is possible to test an infinite number and variety of searches, I decided to concentrate on some informational searches, including both commercial and non-commercial information. What I found is that Google results rarely measure up to its own standards, especially when the information may be missing or hard to come by. But even when the information is there, Google falls short. Sometimes pages that are better, more relevant answers to the query are pushed off the first page of Google results by blogs and junk. An increasing proportion of pages retrieved by Google for informational queries are restricted access articles. If you don't belong to JSTOR or project MUSE through a library or don't want to pay for the article, you won't get the information. These pages should not be the first ones retrieved by a search because most people cannot access them. They are a sort of Webspam because they frequently promise information that is accessible only for members.

For more complex queries, Google evidently simply did not have the answers at all, but would not "admit it," and poured out oodles of listings that were "off topic" - not relevant to the query at all.

I tried to match the query results to expectations based on Google's quality rating. None of the results could be "vital" because i was not looking for any firms. That left the categories: Useful (what you expected to get), Relevant (has relevant information, but may be too broad or too narrow or a sub page of the correct site or a brief article) Not relevant - Too broad or too narrow to fit the query, or has a link to relevant information but is not relevant itself. Off topic - ignored part of the query. For a query about [universities in india] universities in Europe are Off Topic. As I noted, in the article about quality rating, this is a frequent fault of Google queries. Here are the results. You can try the queries yourself and you should get similar results.

Google Search Quality Results

["infant mortality" Ecuador] query

My first query was: ["infant mortality" Ecuador]. For a query about Ecuador, I would expect to get a site sponsored by the government or an article that is all about Ecuador. A perfect match for this query about infant mortality in Ecuador would be an article that was wholly about infant mortality in Ecuador, discussed the reasons for high mortality and progress in lowering it, and gave statistics for infant mortality over a long period. At minimum, one would expect a page retrieved among the first ten results to give at least the figures for infant mortality in Ecuador for a single year. 

Query Results: Google claimed to retrieve 155,000 pages for this query. Of the first ten pages listed (first page of search engine results), all had something about the topic, but none met expectations. The top page was "not relevant" as it was too broad - it was a UN report about general health conditions in Ecuador that listed a single sentence about infant mortality.

One page was a graph of infant mortality and another statistic. Most of the articles were about health or social progress in general in Ecuador but at least had figures for infant mortality. not about infant mortality  but most are too broad or too narrow and did not have more than a sentence about infant mortality. In Google rating terms they were between "relevant" and "not relevant."  One page was restricted access (jstor). At least one page, listed as number 10, must be rated "not relevant" because it is about abortion issues rather than infant mortality - Safe Abortion Hotline Launched in Anti-Choice Ecuador. www.rhrealitycheck.org/blog/2008/07/17/safe-abortion-hotline-lauched-antichoice-ecuador

It should really be rated "off topic." A better page than listing number 10 certainly is the wikipedia page about Ecuadorian demography that is retrieved as number 16:

A better page than listing number 10 certainly is the wikipedia page about Ecuadorian demography that is retrieved as number 16:

Demographics of Ecuador - Wikipedia, the free encyclopedia

en.wikipedia.org/wiki/Demographics_of_Ecuador

That article however, was not about infant mortality in Ecuador, though it at least had some relevant information.

While there are articles about Infant mortality in Ecuador, none of them were listed in full on the Web. Sometimes just titles were listed or abstracts were given, with or without the possibility of paid access to the entire article.

 

["infant mortality" Ecuador 1920] Query

My next query was: ["infant mortality" Ecuador 1920]. Google claimed to have retrieved over 9,000 results. I could not find a single query that had information about infant mortality in Ecuador in 1920. The results were less relevant than the broader query.  The top result was about living longer in general:

Life Expectancy

www.healthpromoting.com/Articles/articles/expect.htm

It mentioned Ecuador somewhere, and infant mortality somewhere else.

The second result at least had the words "Ecuador" and "infant mortality" in it, but it is about coding a study of infant mortality statistics, and did not give any actual results:   

Codebook for "A New Dataset on Infant Mortality Rates, 1816-2002 ...

anessakimball.com/docs/research/InfantMortalityRate_data/IMR_codebk.pdf

The others were similarly off topic for various reasons. None seem to have had any information about infant morality in Ecuador in 1920. A book listing however, did have information about life expectancy in one province of Ecuador in the period in question.

Poland "Gross National Product" 1930] Query

For the query Poland "Gross National Product" 1930] Google claimed to have found 1,550 pages. Not one of the first 10 listings included the Gross national product of Poland for the year 1930. Some were not about Poland at all, most were not about 1930. Listing #23 was the first listing that had an estimate for the year 1929.

http://books.google.com/books?id=82ncGA4GuN4C&pg=PA22&lpg=PA22&dq=Poland+%22Gross+National+Product%22+1930&source=web&ots=wmM8kGJAL3&sig=5IVOWB0MH9F-aAPCuEEFjxTKLwI&hl=en&sa=X&oi=book_result&resnum=3&ct=result

[antispam software download] query

For the query [antispam software download]  Google claimed to have found 1,450,000 pages. Google asked if I really meant "Anti-Spam" software. That query however, retrieved far less results - about 500,000.

Useful (or full) results for this query should have been pages that offered a choice of products to download. Given the large number of listings, one would expect good results. Of the results retrieved, three of those lists on the main page were off topic - not related to SPAM in any interpretation. One is the free AVG anti-Virus and anti-Spyware product. A second is Windows Defender that stops popups. Neither of these products protect against mail spam or Webspam, though they are good products. A third product is anti-Spyware, not related to anti-Spam. Most of the pages were "relevant" - in that they provided the opportunity to download a single product. One page was "useful" (top rating) - it gave the opportunity to download a choice of products.

 

[Free graphics software download] query

Google claimed to have returned 5,720,000 pages for this query. Of the first 10 listings, six were evidently "useful" (highest quality rating) because they were, as expected, listings of a choice of software. Three were "relevant," in that they provided a single product (too narrow for query). One was off topic. The product, www.smartdraw.com/ as frequently happens with such searches, was advertised as a "free download" but in fact the "free" part is only a demo. In effect, this is Webspam, but Google has no good way to defend against it at present and does not try. 

 

Google Search Quality: Summary and Conclusions

Five different queries in Google yielded fair to poor results. In no case were the first ten results listed "Useful" results according to Google criteria. For narrow searches where the information may not exist, Google presented irrelevant results. The search engine acted like a student who does not know the answer to a question and produces an "answer" that is vaguely related to the subject. Google doesn't know how to say "I don't know" and doesn't know when it does not know. Google is not "aware" of important types of information provided in the query. For example, it can't recognize that 1920 is a date rather than just a set of characters and doesn't process information accordingly. It is not case sensitive for any query, so it cannot tell the difference between Apple (computer)  and apple (fruit) , or Windows (operating system) and windows (glass covered openings in houses) or Word (operating system) and word (language unit).

The above defects are somewhat hard to remedy. It is much easier to remove restricted results from the first links presented, because these are in effect SPAM. Likewise, it shouldn't be a problem to slap a big penalty on those who advertise free software and really are really only allowing free demonstration copies. Google should be applying WEBSPAM criteria more fairly. Misleading Web sites should all be penalized equally.  It is also hard to understand why Google lists products that are not anti-SPAM products when the query asks for anti-SPAM. Part of the problem seems to be that Google depends too much on its Google Pagerank algorithm. The pages at the big Web sites, or those that have "Authority," even if they don't match the query, seem to push out other pages that have the right answer to the query.  

"Authority" seems to be pretty arbitrary. An irrelevant blog article was pushed ahead of a somewhat relevant Wikipedia article in the query about infant mortality in Ecuador, because the article was about a much more popular topic - abortions, and therefore got a lot of links.

Another frequent defect I have encountered when searching in Google, is presentation of text that is not on the linked page at all. This can be because of a deliberate SPAM attempt. Usually though, it is because Google spidered the first page of a Web log or other updated main page, and then the Web log was updated, so the information remained only in the permanent article page. The back page was not listed among the top results, presumably because its page rank was too low.   

Search engine results will never be better than the information on the Web. If there really are no Web pages all abut infant mortality in Ecuador, no search engines will find such pages. But Google and other search engines should be able to learn to screen out the "non-results" and to at least warn the user when no results match the query. They can also learn to screen out SPAM and irrelevant pages, no matter how "authoritative" they might seem.

 

Ami Isseroff

Notice: Copyright

All materials are copyright 2008 by Ami Isseroff. All rights reserved. These pages may not be reproduced in any form in electronic or printed media without express written permission from the author.

SEO

SEO Basics

The SEO Book

SEO Articles

SEO Blog

SEO Glossary

Web Pro World Forum

SEO Links

More Links

Love Poems

MidEastWeb: Middle East

Zionism

 

SEO - Web Site Search Engine Optimization Contact: Webmaster(at)Yu-hu.com
site map

Quality of Google Search Results