Welcome!

Java IoT Authors: Liz McMillan, Elizabeth White, Pat Romanski, Yeshim Deniz, Paul Simmons

Related Topics: Containers Expo Blog, @CloudExpo

Containers Expo Blog: Article

Cloud Computing and Reliability

The Cloud offers some enticing advantages with respect to reliability

Eric Novikoff's Blog

IT managers and pundits speak of the reliability of a system in "nines." Two nines is the same as 99%, which comes to (100%-99%)*365 or 3.65 days of downtime per year, which is typical for non-redundant hardware if you include the time to reload the operating system and restore backups (if you have them) after a failure. Three nines is about 8 hours of downtime, four nines is about 52 minutes and the holy grail of 5 nines is 7 minutes.

From a users' point of view, downtime is downtime, but for a provider/vendor/web site manager, downtime is divided into planned and unplanned. Cloud computing can offer some benefits for planned downtime, but the place that it can have the largest effect on a business is in reducing unplanned downtime.

Planned downtime is usually the result of having to do some sort of software maintenance or release process, which is usually outside the domain of the cloud vendor, unless that vendor also offers IT operations services. Other sources of planned downtime are upgrades or scheduled equipment repairs. Most cloud vendors have some planned downtime, but because their business is based on providing high uptime, scheduled downtimes are kept to a minimum.

Unplanned downtime is where cloud vendors have the most to offer, and also the most to lose. Recent large outages at Amazon and Google have shown that even the largest cloud vendors can still have glitches that take considerable time to repair and give potential cloud customers a scare (perhaps it is because they didn't take some planned downtime??) On the other hand, cloud vendors have the experienced staff and proven processes that should produce overall hardware and network reliability that meets or exceeds that of the average corporate data center, and far exceeds anything you can achieve with colocated or self-managed servers.

However, despite claims of reliablity, few cloud vendors have tight SLAs (service level agreements) that promise controlled downtime or offer rebates for excess downtime. Amazon goes the opposite direction and doesn't offer any uptime guarantees, even cautioning users that their instance (or server) can disappear at any time and that they should plan accordingly. AppLogic-based clouds, provided by companies such as ENKI, are capable of offering better guarantees of uptime because of its inherent self-healing capabilities that can enable 3-4 nines of uptime. (The exact number depends on how the AppLogic system is set up and administered, which affects the time needed for the system to heal itself.) However, any cloud computing system, even even those based on AppLogic or similar technologies, can experience unplanned downtime for a variety of reasons, including the common culprit of human error. While I believe it is possible to produce a cloud computing service that exceeds 4-9s of uptime, the costs would be so high that few would buy it when they compared the price to the average cloud offering.


When you're purchasing cloud computing, it makes sense to look at the SLA of the vendor as well as the reliability of the underlying technology. But if your needs for uptime exceed that which the vendors and their technology can offer, there are time-honored techniques for improving it, most of which involve doubling the amount of computing nodes in your application. There's an old adage that each additional "9" of uptime you get doubles your cost, and that's because you need backup systems that are in place to take over if the primaries fail. This involves creating a system architecture for your application that allows for either active/passive failover (meaning that the backup nodes are running but not doing anything) or active/active failover (meaning that the backup nodes are normally providing application computing capability).

These solutions can be implemented in any cloud technology but they always require extra design and configuration effort for your application, and they should be tested rigorously to make sure they will work when the chips are down. Failover solutions are generally less expensive to implement in the Cloud because of the on-demand or pay-as-you go nature of cloud services, which means that you can easily size the backup server nodes to meet your needs and save on computing resources.

An important component of reliability is a good backup strategy. With cloud computing systems like AppLogic offering highly reliable storage as part of the package, many customers are tempted to skip backup. But data loss and the resulting unplanned downtime can result not just from failures in the cloud platform, but also software bugs, human error, or malfeasance such as hacking. If you don't have a backup, you'll be down a long time - and this applies equally to cloud and non-cloud solutions. The advantages of cloud solutions is that there is usually an inexpensive and large storage facility coupled with the cloud computing offering which gives you a convenient place to store your backups.

For the truly fanatical, backing up your data from one cloud vendor to another provides that extra measure of security. It pays to think through your backup strategy because most of today's backup software packages or remote backup services were designed for physical servers and not virtual environments having many virtual servers such as you might find in the cloud. This can mean very high software costs for doing backup if your backup software charges on a "per server" basis and your application is spread across many instances. If your cloud vendor has a backup offering, usually they have found a way to make backup affordable even if your application consists of many compute instances.

Another aspect of reliability that often escapes cloud computing customers new to the world of computing services is monitoring. It's very hard to react to unplanned downtime if you don't know your system is down. It's also hard to avoid unplanned downtime if you don't know you're about to run out of disk space or memory, or perhaps your application is complaining about data corruption. A remote monitoring service can scan your servers in the cloud on a regular basis for faults, application problems, or even measure the performance of your application (like how long it takes to buy a widget in your web store) and report to you if anything is out of the ordinary. I say "service" because if you were to install your own monitoring server into your cloud and the cloud went down, so would your monitoring! At ENKI, we solve this problem by having our monitoring service hosted in a separate data center and under a different software environment than our primary cloud hosting service.

The last aspect of reliability is security. However, that would require another entire article to cover, since security in the cloud is a complex and relatively new topic.

To sum up, the Cloud offers some enticing advantages with respect to reliability, perhaps the largest of which is that you can give your data center operations responsibility to someone who theoretically can do a much better job at a lower cost than you can. However, to get very good reliability, you must still apply traditional approaches of redundancy and observability that have been used in physical data centers for decades - or, you have to find a cloud computing services provider that can implement them for you.

More Stories By Eric Novikoff

Eric Novikoff is COO of ENKI, A Cloud Services Vendor. He has over 20 years of experience in the electronics and software industries, over a range of positions from integrated circuit designer to software/hardware project manager, to Director of Development at an Internet Software As A Service startup, Netsuite.com. His technical, project, and financial management skills have been honed in multiple positions at Hewlett-Packard and Agilent Technologies on a variety of product lines, including managing the development and roll-out of a worldwide CRM and sales automation application for Agilent's $350 million Automatic Test Equipment business. Novikoff also has a strong interest in SME (Small/Medium Size Enterprise) management, process development, and operations as a consequence of working at a web based ERP service startup serving SMEs, and through his small-business ERP consulting work.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
faseidl 09/10/08 11:45:52 AM EDT

Despite what many pundits have to say, reliability issues will not be the downfall of cloud computing. Using cloud computing does not mean neglecting to architect solutions that meet their business requirements, including reliability requirements.

I wrote more about this idea here:

Cloud Computing and Reliability
http://faseidl.com/public/item/212584

@ThingsExpo Stories
"We view the cloud not as a specific technology but as a way of doing business and that way of doing business is transforming the way software, infrastructure and services are being delivered to business," explained Matthew Rosen, CEO and Director at Fusion, in this SYS-CON.tv interview at 18th Cloud Expo (http://www.CloudComputingExpo.com), held June 7-9 at the Javits Center in New York City, NY.
The Founder of NostaLab and a member of the Google Health Advisory Board, John is a unique combination of strategic thinker, marketer and entrepreneur. His career was built on the "science of advertising" combining strategy, creativity and marketing for industry-leading results. Combined with his ability to communicate complicated scientific concepts in a way that consumers and scientists alike can appreciate, John is a sought-after speaker for conferences on the forefront of healthcare science,...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
In his session at Cloud Expo, Alan Winters, U.S. Head of Business Development at MobiDev, presented a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to maximize project result...
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of ...
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at @ThingsExpo, James Kirkland, Red Hat's Chief Archi...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Personalization has long been the holy grail of marketing. Simply stated, communicate the most relevant offer to the right person and you will increase sales. To achieve this, you must understand the individual. Consequently, digital marketers developed many ways to gather and leverage customer information to deliver targeted experiences. In his session at @ThingsExpo, Lou Casal, Founder and Principal Consultant at Practicala, discussed how the Internet of Things (IoT) has accelerated our abilit...
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
DXWorldEXPO LLC announced today that the upcoming DXWorldEXPO | CloudEXPO New York event will feature 10 companies from Poland to participate at the "Poland Digital Transformation Pavilion" on November 12-13, 2018.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...