Block Level Link Analysis
-
Block level link analysis is an attempt to segment Web pages into visual blocks
and treat each block as a separate node for purpose of weighting the authority
given to the links from that that area of the page. The same principle can be
used for determining the actual ranking of the page itself.
For example, an html page in a general Website like
Yahoo! may have some
advertising links in a left sidebar that are not highly relevant to the text, a
menu bar with organizational or general links inside the Web site and a right
hand sidebar and footer with other text. It might also feature several different
articles about different topics. It is difficult to determine what the page is
"about" from analysis of all the text and links and it is difficult to determine
the value to be attached to a link from that page for classifying the target
page.
A 2004 paper* by Microsoft researchers explained the approach. This is the sort
of page they analyzed:

Figure 1: Part of a sample web page (news.yahoo.com). Clearly, this page is made
up of different semantic blocks (with different color rectangle). Different
blocks have different importances in the page. The links in different blocks
point to the pages with different topics.
The authors used a VIsion-based
Page Segmentation (VIPS) algorithm to extract the visual structure of the page
from the document object model. They constructed a block level Web graph for
each page, meaning that the interconnectivity they studied was between segments
of pages rather than pages. They weighted segments in different positions
differently, claiming that central segments would "intuitively" have more
importance than those positions on the margins of the page.
They used their data to apply the conventional
HITS search algorithm and
Google Pagerank to
analysis of the page segments rather than analysis of the pages. They showed
better ranking of the pages they studied than are achieved in conventional
applications of the algorithms to pages.
The Block Level algorithm was supposedly going to revolutionize search, but
it has not done so. One problem may be that it could be impractical to implement
this sort of computation intensive algorithm on a large scale. Apart from the
initial work of visual segmentation, if there are 10 visual segments per page,
it means that there are 10 times as many items on the Web that require
classification.
A second problem is that while the particular example of Yahoo! or a
newspaper or Web log main page is a striking illustration of segmentation, most
Web pages have a far simpler structure. It is generally easier to treat the page
as a unit, and to spot areas that are exclusively links, where links might be of
less relevance than they are in the body of the text. It is possible and even
probable that search algorithms try to take this into account, without
necessarily getting into the complexity of Block level analysis. Links in
context are almost always "better." Even so, it is often misleading to base
conclusions on placement. The fact that the link to the article about Block
Level Link Analysis is in a footnote to this entry in no way degrades its to the
entry. Layout formatting is a matter of preference and convention and
idiosyncrasies. The top left part of a page may be less important than the
center, but pages are often designed to take into account that visitors tend to
center their gaze on the top left of the page. The bottom links in page may be
unrelated, or they may link to the next article in the series or to highly
relevant footnotes.
It must be admitted that "mixed" pages like forums or newspaper or blog pages
can be a source of irrelevant results, especially in searches involving multiple
terms. Consider a search like this ["Barack Obama" abortions]. The visitor wants
to know about Obama's stand on abortions. But search engines may happily
display, along with correct results, pages of newspapers or Web logs that might
have an article about Barack Obama's tax plan that has nothing to do with
abortions and a totally unrelated article about abortions that doesn't mention
Obama. It is not clear that the Block Level Analysis algorithm was intended to
solve this sort of problem, but it could.
Ami Isseroff
December 2, 2008
*
Block-level Link Analysis
Note - Definitions of Search Engine
Optimization terms are based on inferences from common usage and definitions given by other sources. Conclusions about
search engine behavior are based on understanding of the behavior of the most popular search engines. Both are subject
to error or may change. Search engine company management may define or use a term or set or change any policy in any way
they see fit, and may make these definitions and specifications public or not. These decisions and definitions are
beyond our control. Notice: Copyright
All materials are copyright 2008, 2009 by Ami Isseroff. All rights reserved. These pages may not be reproduced in any
form in electronic or printed media without express written permission from the author.
SEO Glossary