Scraper site

A scraper site is a spam website that copies all of its content from other websites using web scraping. The purpose of creating such a site can be to collect advertising revenue or to manipulate search engine rankings by linking to other sites to improve their search engine ranking.

In the last few years^[when?] scraper sites have proliferated at an amazing rate for spamming search engines. Open content is a common source of material for scraper sites.

A search engine is not a scraper site itself; sites such as Yahoo and Google gather content from other websites and index it so that the index can be searched with keywords. Search engines then display snippets of the original site content in response to a user's search.

1 Made for advertising
2 Legality
3 Techniques
- 3.1 Domain hijacking
4 References
5 See also
6 External links

Made for advertising

Some scraper sites are created to make money by using advertising programs. In such case, they are called Made for AdSense sites or MFA^{[citation needed]}. This derogatory term refers to websites that have no redeeming value^{[citation needed]} except to lure visitors to the website for the sole purpose of clicking on advertisements.

Made for AdSense sites are considered sites that are spamming search engines and diluting the search results by providing surfers with less-than-satisfactory search results. The scraped content is considered redundant by the public to that which would be shown by the search engine under normal circumstances, had no MFA website been found in the listings.

Legality

Scraper sites may violate copyright law. Even taking content from an open content site can be a copyright violation, if done in a way which does not respect the license. For instance, the GNU Free Documentation License (GFDL) and Creative Commons ShareAlike (CC-BY-SA) licenses require that a republisher inform readers of the license conditions, and give credit to the original author.

Techniques

Many scrapers will pull snippets and text from websites that rank high for keywords they have targeted. This way they hope to rank highly in the search engine results pages (SERPs). RSS feeds are vulnerable to scrapers.

Some scraper sites consist of advertisements and paragraphs of words randomly selected from a dictionary. Often a visitor will click on a pay-per-click advertisement because it is the only comprehensible text on the page. Operators of these scraper sites gain financially from these clicks. Ad networks claim to be constantly working to remove these sites from their programs, although there is an active polemic about this since these networks benefit directly from the clicks generated at these kind of sites. From the advertiser's point of view, the networks don't seem to be making enough effort to stop this problem.

Scrapers tend to be associated with link farms and are sometimes perceived as the same thing, when multiple scrapers link to the same target site. A frequent target victim site might be accused of link-farm participation, due to the artificial pattern of incoming links to a victim website, linked from multiple scraper sites.

Domain hijacking

Main article: Domain hijacking

Some spammers who create scraper sites may hijack a recently-expired domain name. Doing so will allow spammers to utilize the already-established search rankings for the domain name and incoming links. Some spammers may even try to match the topic of the expired site, to utilize their search rankings for those keywords.^{[citation needed]} For example, an expired website for a photographer may be hijacked by a spammer who would generate a scraper site about photography tips.

References

External links

Black Hat SEO — More about Black Hat SEO technique.

Spamming

Protocols

Email spam	Address munging Bulk email software Directory Harvest Attack Joe job DNSBL DNSWL Spambot Pink contract

Other	Autodialer/Robocall Flyposting Junk fax Messaging Mobile phone Newsgroup Telemarketing VoIP

Anti-spam

Spamdexing

Internet fraud

Contents