The exhaustive media coverage surrounding “cloud computing” is enough to induce readers to tune-out on the topic altogether, but ignoring computing in the cloud is a perilous proposition. Cloud computing will soon be as mainstream as e-mail (coincidentally, one of the first successful cloud offerings). The hype is fueled by pro-cloud commentators, vehemently promoting the cloud panacea, battling it out with cloud naysayers who warn that a move to the cloud is fraught with too much risk for serious consideration. I think both sides are right. An investment in the cloud can yield a tangible cost savings on upfront set-up and ongoing maintenance costs for companies. Additionally, the on-demand aspect of cloud architecture means that companies quickly can adapt to opportunities for growth and can tighten their belts when demand for their services and products shrinks. But cloud detractors are not mere panic mongers—there is significant risk lurking in the cloud. Happily, most companies can have it both ways by focusing on a document, frequently overlooked, that is a shield against many cloud-based risks—the Service Level Agreement or “SLA”.

The main function of any SLA is to establish expectations for the client with respect to software availability or “uptime”. In addition to service guarantees, a good SLA should accomplish the following: 1) establish built-in remedies for the customer if the vendor is unable to meet service guarantees; 2) define disaster recovery provisions; 3) define customer duties with respect to the manner of use of the software; and 4) establish procedures for software maintenance and upgrades. Because most cloud customers depend on software hosted on external networks, stipulating the level of service customers have the right to expect is critical.

Vendors generally measure their availability using metrics that seem understandable, but that often are dangerously vague and difficult to measure from an accounting perspective. For instance, customers may see a “99.999% service access uptime,” (or some variation thereof), standard guarantee from ISPs. This metric may be easy to understand, but it does not necessarily reflect the needs of the customers. For instance, a cloud service may be technically accessible, while large swaths of the functionality are inoperable. With a “service access uptime” metric in place, a customer may be left without access to service credits that should otherwise be available to it. One alternative to consider in those situations may be a SLA based on incident-response-time guarantees or some other metric that is easier to apply and that does not require constant attention.

Because the SLA is so critical to mitigation of one of the primary risks in cloud computing, it is important for a customer to carefully read and understand the SLA and either accept the risk associated with the standard metric or negotiate for more appropriate measurement of success.