Cost analysis: S3 vs Wikimedia

Tim O’Reilly wanted to hear Amazon S3 success stories and numbers. In short, Amazon S3 allows organizations to outsource their storage and they say one can save lots of money. I wondered how that applies to Wikipedia environment and did some calculations.

The cost structure is very simple – $0.15 per gigabyte stored and $0.2 per gigabyte transfered. It is not that easy exercise to immediately convert that into costs already in one’s head, so it took a bit of work before being able to produce any summaries or conclusions.

Some numbers may slightly differ in Wikimedia operation, I’m not sure how much of cost structure I could disclose, so I’ll just use some sane or widely known figures.  Here’s what we get…

So, the idea was to offload media storage and serving to S3. Our cluster does average 2Gbps of traffic, with peaks at 3Gbps, say 50% of that is images. That means we have to handle 1.5Gbps of images traffic if we maintain our own systems, or we serve 10TB of data daily via Amazon S3.  This results in $60000 monthly bill for S3 traffic, add few hundreds for storage (we have just few terabytes of data) and this results in $60300 per month.

Now calculating Wikimedia costs would be much more complicated – we have to take storage servers into account ($30000), add bunch of cache servers ($100000), some mediocre routing gear ($30000), pay $5000 for racks and electricity, and of course, get 1.5Gbps – at 15$/Mbps (Cogent may offer $10 and a free iPod!) this ends up being additional $22500. With hardware costs distributed over 12 months the bill totals at about $41000.

Some of hardware is of course shared for other tasks, some numbers may not be accurate, but of course, the final number is much lower. Additionally, we get distributed CDN (faster response times!), efficient invalidation, our own statistics and dynamic access miss rules (404 handlers). Oh, and while Amazon serves 16000 requests per second at their total peaks, we do over 30000 (thats with text pages).

The biggest cost one pays for S3 appears to be bandwidth rather than storage, so if traffic persists, having own infrastructure seems to be cheaper. Unless I missed something.

9 thoughts on “Cost analysis: S3 vs Wikimedia”

  1. That’s correct.

    I don’t think S3 is really intended to be a CDN or a bandwidth outsourcing method, but a storage method. For Wikimedia’s purpose it would be the wrong solution. I’d look more towards an Akamai-like CDN to outsource to.

    Using S3 for redundancy, fallover (in case servers can’t meet demand) would perhaps be a bit more practical.

  2. their new pricing doesn’t lower prices that much – though bandwidth becomes cheaper (~20000$ per month), request costs add up (additional 20000-25000$).

    on the other hand, for stuff like download services, new bandwidth cost structure seems to be a good deal.

  3. Double check your bandwidth costs calculations. Most providers charge you on your *peak* (or rather, a more complicated calculation that ends up being about 95% of your peak) bandwidth used, NOT average. This is a subtle but important consideration when comparing S3 bandwidth (which charges the same regardless of how volatile your bw usage is) to other providers which tend to charge more for more volatile usage.

  4. I used peak for our current bandwidth costs and average for calculating S3 – I really took that into account, didn’t help.

  5. Offloading only the peak 10% or 5% to S3 might be more interesting since that would reduce the peak Gb/s without using much S3 bandwidth.

Comments are closed.

%d bloggers like this: