Alex Pooley's Blog

Hello there, my name is Alex Pooley and I'm a freelance web developer residing in Perth, Western Australia. My passion is in the development of web sites that solve everyday problems. Here's a gallery of some of my notable work. If you need a web site designer or developer, contact me with further details. Lastly, you can read more about me.

Duplicate Content: Why It May Not Matter

June 14th, 2007

Why don't you subscribe to my blog while you're here? I'm a freelance web developer and I blog about Ruby, Rails, and business online.

Go ahead and subscribe to my RSS feed. Thanks for visiting!

A concern among many SEO and SEM experts is that search engines may penalize duplicate content (web pages ripped from PLR, open publishing sites). After learning about the internals of search engine algorithms I suspect dup content is not an issue for any search engine.

There are three generic information retrieval techniques: boolean, probabilistic, and vector space models. Vector space models appear to be the current hot topic, specifically with latent semantic indexing (LSI) leading the charge. With vector space models you project the data in to a set of co-ordinates and then use the co-ordinates as a metric to compare the data against itself. Duplicate content is projected to the same co-ordinate. In general, similar content is projected close to similar content, and away from dissimilar content.

Given that duplicate content will be located at the same co-ordinate, a search engine would simply continue to rank the content by other criteria such as page rank, domain authority, etc. So I guess if someone is ripping your content and they are considered more authoritative - you have a problem; but I suspect this would be a very unusual situation.

[Also, I had read on Matt Cutt's blog that you should not be concerned about dup content though I can't find the blog post.]

2 Responses to “Duplicate Content: Why It May Not Matter”

  1. Ken Ewell Says:

    Why do you say duplicates do not matter iin semantic search; that is just silly. In fact, duplicates matter very much.

    The problem is ocntent is copied from blog to blog to website to one or more agregators…. ‘nuf said.

  2. Alex Says:

    My post was meant to illustrate why duplicate content may not matter for a web site using PLR, and other copied material. There is a concern among some people that sites with duplicate content may be penalized. I tried to suggest in the post that search engine algorithms inherently identify duplicate content and can work around the fact without explicitly penalizing a site.

Leave a Reply

buy mp3 music uk vpn