Claim: Using different variations of the URL address in links causes duplicate listings in Google and other
search engines and may invoke a penalty.
Status: Unknown
Google and possibly other search engines have a known and very visible software defect ("bug"). They will produce
duplicate listings for the same page with the www prefix and without the www prefix and they will retrieve different
numbers of pages if you search by URL. For example, site:www.mysite.com and site:mysite.com queries in Google will show
different numbers of pages very often, and there may be duplicate listings of a page like
www.seo.yu-hu.com/deep_links_hurt_seo.html and
seo.yu-hu.com/deep_links_hurt_seo.html. There might even
be more inflated listings that show both the subdomain path and a separate listing for the straight directory pathway.
Some of these duplicates are unavoidable and legitimate behavior of the search engine. For example, you could have added
a domain name that points to a subdirectory after the original pages were listed, so that middle_east/history now has a
middle_east_history.org domain. Each page would be listed twice or more. This could be done innocently, or it could be
done as part of a
Black Hat SEO campaign.
It is not known in what ways these duplicates affect search engine penalties or whether or not these duplications
affect page positioning. What seems to be true is that if you search for a keyword, one and only one of the duplicate
pages is ordinarily retrieved. It is best to try to be consistent. If search engines can't tell that
http://www.mideastweb.org is the same as
http://mideastweb.org
then it is really a search engine bug. Nobody should be penalized for these problems, because no Webmaster can control
how external sites are going to link to a page or site. There is a difference however, between saying that the search
engines may not be able to resolve the two URLs as the same domain and therefore the top page gets poorer positioning,
and claiming that search engines actively demote pages or sites for such "duplicates." Nobody knows if either is true to
the best of my knowledge, except the search engine people. They would obviously be reluctant to discuss this defect.
Search engines almost certainly cannot tell that the default main page, mysite.com /index.html for example, is the
same page as mysite.com. You can specify anything as the default page after all. For that reason, the main page of a
site or directory should always be linked using the domain name or the directory path, consistently, and you should ask
people who link to to your domain NOT to add the name of the main page file.
More Search Engine Optimization Superstitions
Online Search Engine Optimization Handbook