Hacks from the kitchen: Serving indices over HTML

August 16, 2007

Serving indices over HTML

Mike Migurski is doing something quite interesting: Dumping the database engine as a middle man and just storing DB-indexes directly as HTML. Fullest, but preliminary, detail here. Let's let Mike explain

There's a short list of reasons to do this:
A "database" that offers nothing but static file downloads will likely be more scalable than one that needs to do work internally. This architecture is even more shared-nothing than systems with multiple database slaves.

Not needing a running process to serve requests makes publishing less of a headache.

I'm using Amazon Web Services to do the hosting, and their pricing plans make it clear that bandwidth and storage are cheap, while processing is expensive. Indexes served over HTTP optimize for the former and make the latter unnecessary. It's interesting to note that the forthcoming S3 pricing change is geared toward encouraging chunkier blocks of data.

The particular data involved is well-suited to this method. A lot of current web services are optimized for heavy reads and infrequent writes. Often, they use a MySQL master/slave setup where the occasional write happens on one master database server, and a small army of slaves along with liberal use of caching makes it possible for large numbers of concurrent users to read. Here, we've got infrequently-updated information from a single source, and no user input whatsoever. It makes sense for the expensive processing of uploading and indexing to happen in one place, about once per day.

It's kinda like what the various static indices for my blogs do, but done a little better - with semantic markup and all.

Posted by Claus at August 16, 2007 12:51 PM

Comments