Alex Pooley's Blog

Hello there, my name is Alex Pooley and I'm a freelance web developer residing in Perth, Western Australia. My passion is in the development of web sites that solve everyday problems. Here's a gallery of some of my notable work. If you need a web site designer or developer, contact me with further details. Lastly, you can read more about me.

Duplicate Content: Why It May Not Matter

June 14th, 2007

Why don't you subscribe to my blog while you're here? I'm a freelance web developer and I blog about Ruby, Rails, and business online.

Go ahead and subscribe to my RSS feed. Thanks for visiting!

A concern among many SEO and SEM experts is that search engines may penalize duplicate content (web pages ripped from PLR, open publishing sites). After learning about the internals of search engine algorithms I suspect dup content is not an issue for any search engine.

There are three generic information retrieval techniques: boolean, probabilistic, and vector space models. Vector space models appear to be the current hot topic, specifically with latent semantic indexing (LSI) leading the charge. With vector space models you project the data in to a set of co-ordinates and then use the co-ordinates as a metric to compare the data against itself. Duplicate content is projected to the same co-ordinate. In general, similar content is projected close to similar content, and away from dissimilar content.

Given that duplicate content will be located at the same co-ordinate, a search engine would simply continue to rank the content by other criteria such as page rank, domain authority, etc. So I guess if someone is ripping your content and they are considered more authoritative - you have a problem; but I suspect this would be a very unusual situation.

[Also, I had read on Matt Cutt's blog that you should not be concerned about dup content though I can't find the blog post.]

Advertising Demand Vs Query Demand

June 11th, 2007

Bidding on search terms grows exponentially with popularity

As you can see from the log scaled plot to the left, advertising demand on search terms grows exponentially with popularity. You can see for yourself in “Figure 1″ of this paper from Yahoo! Research Labs.

Using Wordze [aff] as a keyword data source, I plotted 800,000+ search keywords from 760 seed keywords and their corresponding demand to roughly determine if this advertiser demand is warranted. The log scaled plot is below with the horizontal axis as seed keywords ranked by descending popularity, and the vertical scale as the amount of queries that include the seed keyword.

Distribution of search demand

[The reason for the large drop off in the plot is most likely because of my limited data.]

Searchers do appear to follow an exponential model where popular queries are exponentially more frequent than unpopular queries.

This so far does not say anything about the correlation between queries and advertiser demand. Specifically I would be interested to determine if the market efficiently demands popular keywords proportional to query frequency. As a guess, any deviation from a strong correlation would suggest either:

  • A relatively good value keyword.
  • A poor converting keyword where the market has already factored human behavior in to the model.

I suspect some markets will be more mature and so may already factor human behavior in to the model. This maturity would probably be proportional to the size of the market.

Speculation is fun but almost useless. I have some data laying around where I can test some of these ideas in the future. Stay tuned :)

Google’s Hottest Searches

May 24th, 2007

I’m not sure how much has been said about this already, but if you haven’t heard Google are now publishing a frequently updated list of popular search terms. This is superseding their weekly Zeitgeist. You can find Google’s announcement here, and the top 100 query resource here:

http://www.google.com/trends/hottrends

This news will probably excite the IM and SEO readers of this blog. If this is you, then you may also be interested in Neilsen Blogmetrics’ BlogPulse resource.

Maybe this information is not exciting to you. How about a couple of dancing giraffes?