Search Engine Optimization

Block Level Link Analysis


Block Level Link Analysis

 

Block Level Link Analysis - Block level link analysis is an attempt to segment Web pages into visual blocks and treat each block as a separate node for purpose of weighting the authority given to the links from that that area of the page. The same principle can be used for determining the actual ranking of the page itself.

For example, an html page in a general Website like Yahoo! may have some advertising links in a left sidebar that are not highly relevant to the text, a menu bar with organizational or general links inside the Web site and a right hand sidebar and footer with other text. It might also feature several different articles about different topics. It is difficult to determine what the page is "about" from analysis of all the text and links and it is difficult to determine the value to be attached to a link from that page for classifying the target page.

A 2004 paper* by Microsoft researchers explained the approach. This is the sort of page they analyzed:

Figure 1: Part of a sample web page (news.yahoo.com). Clearly, this page is made up of different semantic blocks (with different color rectangle). Different blocks have different importances in the page. The links in different blocks point to the pages with different topics.

The authors used a VIsion-based Page Segmentation (VIPS) algorithm to extract the visual structure of the page from the document object model. They constructed a block level Web graph for each page, meaning that the interconnectivity they studied was between segments of pages rather than pages. They weighted segments in different positions differently, claiming that central segments would "intuitively" have more importance than those positions on the margins of the page.

They used their data to apply the conventional HITS search algorithm and Google Pagerank to analysis of the page segments rather than analysis of the pages. They showed better ranking of the pages they studied than are achieved in conventional applications of the algorithms to pages.

The Block Level algorithm was supposedly going to revolutionize search, but it has not done so. One problem may be that it could be impractical to implement this sort of computation intensive algorithm on a large scale. Apart from the initial work of visual segmentation, if there are 10 visual segments per page, it means that there are 10 times as many items on the Web that require classification.

A second problem is that while the particular example of Yahoo! or a newspaper or Web log main page is a striking illustration of segmentation, most Web pages have a far simpler structure. It is generally easier to treat the page as a unit, and to spot areas that are exclusively links, where links might be of less relevance than they are in the body of the text. It is possible and even probable that search algorithms try to take this into account, without necessarily getting into the complexity of Block level analysis. Links in context are almost always "better." Even so, it is often misleading to base conclusions on placement. The fact that the link to the article about Block Level Link Analysis is in a footnote to this entry in no way degrades its to the entry. Layout formatting is a matter of preference and convention and idiosyncrasies. The top left part of a page may be less important than the center, but pages are often designed to take into account that visitors tend to center their gaze on the top left of the page. The bottom links in page may be unrelated, or they may link to the next article in the series or to highly relevant footnotes.

It must be admitted that "mixed" pages like forums or newspaper or blog pages can be a source of irrelevant results, especially in searches involving multiple terms. Consider a search like this ["Barack Obama" abortions]. The visitor wants to know about Obama's stand on abortions. But search engines may happily display, along with correct results, pages of newspapers or Web logs that might have an article about Barack Obama's tax plan that has nothing to do with abortions and a totally unrelated article about abortions that doesn't mention Obama. It is not clear that the Block Level Analysis algorithm was intended to solve this sort of problem, but it could.

Ami Isseroff

December 2, 2008

* Block-level Link Analysis 

Note - Definitions of Search Engine Optimization terms are based on inferences from common usage and definitions given by other sources. Conclusions about search engine behavior are based on understanding of the behavior of the most popular search engines. Both are subject to error or may change. Search engine company management may define or use a term or set or change any policy in any way they see fit, and may make these definitions and specifications public or not. These decisions and definitions are beyond our control.  

Notice: Copyright

All materials are copyright 2008, 2009 by Ami Isseroff. All rights reserved. These pages may not be reproduced in any form in electronic or printed media without express written permission from the author.

SEO Glossary

SEO

SEO Basics

The SEO Book

SEO Articles

SEO Blog

More Links

MidEastWeb: Middle East

Zionism

SEO - Web Site Search Engine Optimization Contact: Webmaster(at)Yu-hu.com
site map
Block Level Link Analysis