Another notch in the belt of “Bad Technology Decisions in action”…
Air America Radio was dead in the water up through 11 AM yesterday. They went from people being able to visit the home page, click a link, and listen, to requiring “registration” in order for people to get to the stream. Around 11 AM Mountain Daylight Time, they went back to their “old” setup (heh, the network is only a few days old) because their servers crashed and burned.
Consequently, my link to their stream was broken as they went to a “private” URI that you had to log in to see. I registered for an account (or tried to) in hopes I could see the stream URI and be able to update it on my block on the right-hand side of this page, but with everything totally borked, that was just out of the question.
But the biggest mistake is this: they made, IMHO, some poor technology choices for what appears to be an incredibly high-demand site. They are using Cold Fusion running on Microsoft Internet Information Server for their content management system. While these are nominally fine choices for small-to-mid-scale configurations, you have to throw a lot of hardware at the problem in order to handle the massive loads they appeared to be experiencing. You’d need to hefty back-end to drive it regardless of operating system and Content Management, and Cold Fusion is definitely no lightweight. There’s obviously either a few programmatic issues or else a severe lack of capacity at work there.
Bandwidth isn’t their problem: systems management is. They didn’t provide any informative message to users that the system was experiencing high load and unable to service requests, and their registration system was borked (probably) due to load during this big downtime window. My suggestion would by some dynamic offloading of registration through the use of a proxy-caching front-end: figure out the maximum load the system can sustain, track that using a reverse-proxy-caching front-end, and when load exceeds some arbitrary threshold at a point where the designers know it can handle it, throw up a message saying “unable to process your request at this time”, rather than allowing the system to try to process requests until everything’s timing out and nothing is happening.
* Sizeable farm of servers able to handle multiple millions of hits using dynamic content and proprietary technology like Cold Fusion, Oracle, and Microsoft Windows: $2,000,000+.
* Radio personalities to populate your new radio network: $$unknown millions$$/year
* Watching your web servers and streaming audio crash and burn due to poor technology decisions:
Priceless.