Alex Pooley's Blog

Hello there, my name is Alex Pooley and I'm a freelance web developer residing in Perth, Western Australia. My passion is in the development of web sites that solve everyday problems. Here's a gallery of some of my notable work. If you need a web site designer or developer, contact me with further details. Lastly, you can read more about me.

Scaling Hell

November 25th, 2006

Why don't you subscribe to my blog while you're here? I'm a freelance web developer and I blog about Ruby, Rails, and business online.

Go ahead and subscribe to my RSS feed. Thanks for visiting!

I’ve been pretty quiet (again) lately because I’ve been busy trying to scale the msgpad platform sitting beneath ScribbleHere.

A couple of weeks ago I had split the code so that I could use a cluster of nodes to perform processing on HTTP requests, instead of just a single computer. In theory it was a nice idea, but I ended up spending a few days resolving a dead lock that came about from closing result sets out of order.

Anyway, with that out of the way I thought I was home free. Of course, I wasn’t, I started to find that I was getting horrible performance. The performance was several times worse than when I only had a single box processing the requests. To cut a very long and frustrating story short, it turned out the link between the database and the application server was becoming saturated.

The problem was that the proxy was on the same machine as the database. The data flow was like this: user - proxy - application server - db - application server - proxy - user. I had effectively created a high tech echo chamber, because having the proxy and database on the same box meant a single request was routed along the same pipe four times. Even with this architecture I thought the network would not be the bottleneck, but it was. Anyway, for testing purposes I threw in another node to split the bandwidth consumption between the db/proxy node and the application server node, and the difference was quite amazing.

Here’s a couple of graphs to illustrate the difference. The black line is the response time, and the grey line is the error rate. The horizontal axis is the number of concurrent requests per second.

The benchmark with only a single application server:

graph

Here’s the benchmark with two application servers:

graph

I will be modifying the architecture of the system after seeing these results. I am still a bit undecided, but I am leaning towards performing a basic HTTP temporary redirect instead of having the reverse proxy. The only down side is that the system is polling based and every poll will result in a redirect, which is a bit wasteful. Another option would be to perform round robin DNS but I am using EC2 as application nodes, and the conflict with the ease of setup/teardown with the EC2 nodes contrasted with the delays in DNS propogation leaves a sour taste.

Ps. I mentioned I didn’t like EC2 and S3 for hosting purposes in a previous post, but EC2 as an application server cluster is beautiful. I can scale up/down quickly, and it was very easy to create custom images for EC2.

The Problem With S3,EC2, and Web Hosting

November 12th, 2006

AWS

Update: It looks like EC2 data persistence worked a little different to the way I understood it. Apparently you can reboot instances and not lose anything, but you will lose everything if you terminate the instance. Fair enough. But, the concern then appears to be that if your box crashes, you can’t reboot and can only terminate or possibly get an admin to manually reboot the box. There’s a nice thread here if you’re interested.


There is clearly a lot of interest in using S3 as a web hosting service, and particularly in combination with EC2. Unfortunately it doesn’t work yet, and here’s why…

S3 is Amazon’s simple storage service which is a geographically distributed storage grid. The nice thing about this service is that it’s inexpensive (pay per use), fault tolerant, and provides great performance. The huge problem with this service as a web host is that it really is very simple. Not simple to use however, it’s complicated, but simple as in stupid. Why? Well, you don’t have access to things you take for granted with web hosting like default index files, mod_rewrite, auto compression for gzip/deflate requests.

I was using S3 for serving my static assets. I decided to switch to a simple shared host because I had to serve a 220K javascript asset uncompressed, compared to the 56K compressed equivalent.

The pro S3 web hosters at this point chime in with, “just stick it behind a web server, EC2 *wink* *wink*, and perform redirects”.

OK, so take the hint and stick is behind EC2, now you’re paying a minimum of $72 a month and it’s no longer an inexpensive service. You also need to start compiling Linux images for EC2 - but all I want to do is push static assets like HTML, CSS, JS, and Images! Ignore the hint to throw S3 behind EC2 and throw it behind a web server elsewhere, but there goes your fault tolerance advantage!

So why not just ignore S3 and simply use EC2 for your web hosting needs? Well, EC2 requires a Linux image - a bit complicated for my time constrained mind at the moment, and worst of all, EC2 does not provide persistent storage because that’s what S3 is for! If your EC2 crashes, or you bring it down for some reason then all the data stored on the box is gone (really, it’s in the FAQ).

S3 and EC2 are a couple of really interesting products. I really like the way Amazon is supplying Internet infrastructure. S3 and EC2 are just not built for web hosting right now. The uses I’m hearing for S3 and EC2 is mostly for testing environments, or as an external backup system. I’m sure other services will build on what’s already there and a great hosting platform will be created, but for now things just don’t fit right, so no thanks.

buy mp3 music uk vpn