Duplicate Content: Why It May Not Matter
Why don't you subscribe to my blog while you're here? I'm a freelance web developer and I blog about Ruby, Rails, and business online.
Go ahead and subscribe to my RSS feed. Thanks for visiting!
A concern among many SEO and SEM experts is that search engines may penalize duplicate content (web pages ripped from PLR, open publishing sites). After learning about the internals of search engine algorithms I suspect dup content is not an issue for any search engine.
There are three generic information retrieval techniques: boolean, probabilistic, and vector space models. Vector space models appear to be the current hot topic, specifically with latent semantic indexing (LSI) leading the charge. With vector space models you project the data in to a set of co-ordinates and then use the co-ordinates as a metric to compare the data against itself. Duplicate content is projected to the same co-ordinate. In general, similar content is projected close to similar content, and away from dissimilar content.
Given that duplicate content will be located at the same co-ordinate, a search engine would simply continue to rank the content by other criteria such as page rank, domain authority, etc. So I guess if someone is ripping your content and they are considered more authoritative - you have a problem; but I suspect this would be a very unusual situation.
[Also, I had read on Matt Cutt's blog that you should not be concerned about dup content though I can't find the blog post.]