After knowing that PageRank and other techniques used by Google are spam Ineffective, various scammers turned to a method designed to fool the PageRank algorithm into overvaluing certain pages.
Fraudsters used techniques to increase the rank of their pages.
The techniques for artificially increasing PageRank of pages are called link spam.
We will examine how spammers create link spam and then discuss certain methods for decreasing the effectiveness of spamming techniques.
Architecture of a Spam Farm:
A collection of Pages whose purpose is to increase the PageRank of a certain page or pages is called a spam farm.Above diagram depicts a simple view of a spam farm.
From the view of spammers, the Web is divided into three parts viz. Inaccessible pages, accessible pages and own pages.
The pages that spammers cannot affect are inaccessible pages. Most of the Web is in this part.
The pages that are not controlled but can be affected by spammers are accessible pages.
The pages owned and controlled by spammers are own pages.
By organizing spammer’s own pages in a special way as in diagram and with some links from accessible pages spam farms are generated. Without a link from outside, the spam farm would be useless,as it would not be crawled by search engines.
It is surprising that one can affect a page without owning it. However there are many sites like blogs or newspapers that invite others to comment on the site.
In order to improve PageRank of their own pages from outside, the spammers post many comments like “Nice. Please visit www.mysite.com”.
In the spam farm there is one target page, at which spammer attempts to place as much PageRank as possible. There are a large number of supporting pages, that accumulate the portion of the PageRank that is distributed equally to all pages.
How to remove Link Spam?:
It is essential for search engines to detect and eliminate link spam.There are two approaches to link spam.
One is to look for structures like spam farms where one page links to a large number of pages, each of which links back to it. Search engines search for such structures and eliminate those pages from their index.
Due to which spammers develop different structures that have the same effect. There is no end to variations. Therefore this method is somewhat ineffective to link spam.
There is another approach that doesn’t rely on locating spam farms. A search engine can modify the definition of PageRank to lower rank of link-spam pages automatically. Two different formulas to achieve this are TrustRank,SpamMass.
What is TrustRank?
TrustRank is a variation of topic-sensitive PageRank designed to lower the score of spam pages. In TrustRank, “topic” is a set of pages believed to be trustworthy i.e. not spam.
Idea behind is that while a spam page might easily be made to link a trustworthy page, it is unlikely that a trustworthy page would link to a spam page.
What is Spam Mass?
Spam Mass is a calculation that identifies the pages that are likely to be spam. Such pages can be eliminated or assigned with lower rank by search engines.
The theory behind spam mass is that we calculate for each page the fraction of its PageRank that comes from spam. We achieve this by computing ordinary Pagerank and TrustRank based on a set of trustworthy Pages.
Consider PageRank as p and TrustRank as t then spam mass can be calculated as (p-t)/p. If spam mass is negative or small positive value then the page is probably not a spam page.
Pages with high spam mass can be eliminated.
Link spam is one of the major challenges for search engines. Eliminating and managing spam pages is important as it creates trust with users.
Even after applying various algorithm links, spam is present and search engine works to find and remove such pages.
Spammers try new ways to exploit search engines. Combating them is important.
Hope it helps you!