I have a simple question and wish to hear others' experiences regarding which is the best way to replicate images across multiple hosts.

I have determined that storing images in the database and then using database replication over multiple hosts would result in maximum availability.

The worry I have with the filesystem is the difficulty synchronising the images (e.g I don't want 5 servers all hitting the same server for images!).

Now, the only concerns I have with storing images in the database is the extra queries hitting the database and the extra handling i'd have to put in place in apache if I wanted 'virtual' image links to point to database entries. (e.g AddHandler)

As far as my understanding goes:

  • If you have a script serving up the images: Each image would require a database call.
  • If you display the images inline as binary data: Which could be done in a single database call.
  • To provide external / linkable images you would have to add a addHandler for the extension you wish to 'fake' and point it to your scripting language (e.g php, asp).

I might have missed something, but I'm curious if anyone has any better ideas?


Edit: Tom has suggested using mod_rewrite to save using an AddHandler, I have accepted as a proposed solution to the AddHandler issue; however I don't yet feel like I have a complete solution yet so please, please, keep answering ;)

A few have suggested using lighttpd over Apache. How different are the ISAPI modules for lighttpd?

Accepted Answer

If you store images in the database, you take an extra database hit plus you lose the innate caching/file serving optimizations in your web server. Apache will serve a static image much faster than PHP can manage it.

In our large app environments, we use up to 4 clusters:

  • App server cluster
  • Web service/data service cluster
  • Static resource (image, documents, multi-media) cluster
  • Database cluster

You'd be surprised how much traffic a static resource server can handle. Since it's not really computing (no app logic), a response can be optimized like crazy. If you go with a separate static resource cluster, you also leave yourself open to change just that portion of your architecture. For instance, in some benchmarks lighttpd is even faster at serving static resources than apache. If you have a separate cluster, you can change your http server there without changing anything else in your app environment.

I'd start with a 2-machine static resource cluster and see how that performs. That's another benefit of separating functions - you can scale out only where you need it. As far as synchronizing files, take a look at existing file synchronization tools versus rolling your own. You may find something that does what you need without having to write a line of code.

Written by Corbin March
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki