While cloud computing offers a host of advantages, there's a downside: State and local governments can't just shut down infrastructure for maintenance. Instead, public-sector IT managers must be able to migrate live data, swap out failed components, upgrade hardware and update software on the fly.
With that in mind, high availability becomes imperative. Heed this advice to build a highly resilient IT architecture to support cloud computing.
Pay Attention to the Details.
High-availability systems call for more than simply purchasing spare patch cords and power supplies. Each piece in the chain between application and user has to be thought out and tested for compatibility. Many new high availability technologies, such as network link aggregation, have interoperability issues and limitations that must be carefully addressed.
When specifying features such as link aggregation (sometimes called EtherChannel or bonding), check that server and network manufacturers are on the same page about capabilities and support for standards.
Test, Test and Test Again.
Many high-availability designs look great in Microsoft Visio but don't work in practice. Create a test plan for every piece of infrastructure, and document the results of the testing. This is especially critical for power infrastructure because devices can have unpredictable sensitivities to power shifts that only show up when someone actually pulls a plug.
Repeat the test plan every time there's a change in the infrastructure, including software upgrades in operating systems and network equipment, as well as significant changes in power requirements (up or down). In any case, conduct power testing every few months to confirm that batteries and generators are operating correctly.
Plan for Obvious Failures.
Yes, large chassis-based switches can completely fail, but that doesn't happen very often.
However, plan for the inevitable software upgrade with a rolling reboot of switches. Disks crash frequently, and patch cords are often accidentally unplugged, so consider them to be persistent pain points. What's more, fans and power supplies can break down because moving parts and power glitches take their toll. Finally, anything connecting one building to another, whether fiber, copper or radio, has a high failure rate. Focus on the problems that will inevitably occur in order to maintain the highest level of uptime.
Don't Exceed Objectives.
Set an uptime objective and stick with it because high availability is expensive. Duplication of components is just a start, but there are also costs in designing, testing and maintaining more complicated configurations and the inefficient use of power and cooling. When organizations overengineer and design past their goal, they're spending more money than they have to.
When IT managers shoot toward 99.999 percent reliability, each additional nine costs a lot more than the previous one. Doing more than required is wasteful on all fronts.