Architecture Patterns For The Cloud
Amazon, on August 24, 2006 made a test version of its Elastic Computing Cloud (EC2) public. EC2 allowed hiring infrastructure and accessing it over the internet. The term “Cloud Computing” was coined a year later, to describe the occurrence that was not limited to hiring the infrastructure over the internet but encompassed a wide range of technology sets offerings, including Infrastructure as a Service (IaaS), web hosting, Platform as a Service (PaaS), Software as a Service (SaaS), network, storage, High Performance Computing (HPC) and many more.
Maturity of many of the technologies like Internet, high performing networks, Virtualization, and grid computing played vital role in the evolution and success of the “Cloud Computing”. Cloud platforms are highly scalable, can be made obtainable on need, scaled-up or scaled-down quickly as required and are very cost effective. These factors are leveraged by the enterprises for fostering innovation, which is the survival and growth mantra for the new age businesses.
An upward surge in the adoption of cloud by the all sizes of business enterprises has confirmed the concept that it is more than a fad and will stay. As the cloud platforms get maturity and some of the inhibitions, for genuine reasons, regarding security and proprietary are addressed more and more businesses will see themselves moving to the cloud.
Designing complicate and highly distributed systems was always a daunting task. Cloud platforms provide many of the infrastructure elements and building blocks that ease building such applications. It opens the door of unlimited possibilities. But with the opportunities come the challenges. The strength that the cloud platforms offer doesn’t guarantee a successful implementation, leveraging them correctly does.
This article intends to introduce the readers with some of the popular and useful architectural patterns that are often implemented to harness the potentials of the cloud platforms. The patterns themselves are not specific to the cloud platform but can be effectively implemented there. except that these patterns are generic and in most of the situations can be applied to various cloud scenarios like IaaS and PaaS. Wherever possible the most likely helpful sets (or tools) that could help implementing the pattern being discussed have been cited from Azure, AWS or both.
Traditionally getting more powerful computer (with a better processor, more RAM or bigger storage) was the only way to get more computing strength when needed. This approach was called Vertical Scaling (Scaling Up). except being inflexible and costly it had some inherent limitations- strength of one piece of the hardware can’t be moved up beyond a certain threshold and, the monolithic structure of the infrastructure can’t be load balanced. Horizontal Scaling (Scaling Out) takes a better approach. Instead of making the one piece of the hardware bigger and bigger, it gets more computing resources by adding multiple computers each having limited computing strength. This novel approach doesn’t limit the number of computers (called nodes) that can participate and so provides theoretically infinite computing resources. Individual nodes can be of limited size themselves, but as many as required of them can be additional or already removed to meet the changing need. This approach gives nearly unlimited capacity together with the flexibility of adding or removing the nodes as requirement changes and the nodes can be load balanced.
In Horizontal Scaling usually there are different types of nodes performing specific roles, e.g., Web Server, Application Server or Database Server. It is likely that each of these node types will have a specific configuration. Each of the instances of a node kind (e.g., Web Server) could have similar of different configurations. Cloud platforms allow creation of the node instances from images and many other management roles that can be automated. Keeping that in mind using the homogeneous nodes (nodes with identical configurations) for a specific node kind is a better approach.
Horizontal Scaling is very appropriate for the scenarios where:
- Enormous computing strength is required or will be required in future that can’t be provided already by the largest obtainable computer
- The computing needs are changing and may have drops and spikes that can or can’t get expected
- The application is business basic and can’t provide a slowdown in the performance or a downtime
This pattern is typically used in combination with the Node Termination Pattern (which covers concerns when releasing compute nodes) and the Auto-Scaling Pattern (which covers automation).
It is very important to keep the nodes stateless and independent of each other (independent Nodes). Applications should store their user sessions details on a separate node with some persistent storage- in a database, cloud storage, distributed cache etc. Stateless node will ensure better failover, as the new node that comes up in case of a failure can always pick up the details from there. Also it will remove the need of implementing the sticky sessions and simple and effective round robin load balancing can be implemented.
Public cloud platforms are optimized for horizontal scaling. Computer instances (nodes) can be produced scaled up or down, load balanced and terminated on need. Most of them also allow automated load balancing; failover and rule based horizontal scaling.
Since the horizontal scaling is to cater to the changing demands it is important to understand the usages patterns. Since there and multiple instances of various node types and their numbers can change dynamically collecting the operational data, combining and analyzing them for deriving any meaning is not an easy task. There are third party tools obtainable to automate this task and Azure too provides some facilities. The Windows Azure Diagnostics (WAD) Monitor is a platform service that can be used to gather data from all of your role instances and store it centrally in a single Windows Azure Storage Account. Once the data is gathered, examination and reporting becomes possible. Another source of operational data is the Windows Azure Storage Analytics characterize that includes metrics and access logs from Windows Azure Storage Blobs, Tables, and Queues.
Microsoft Azure has Windows Azure portal and Amazon provides Amazon Web sets dashboard as management portals. Both of them provide APIs for programmatic access to these sets.
QUEUE CENTRIC WORKFLOW
Queues have been used effectively implementing the asynchronous mode of processing since long. Queue-centric workflow patterns implement asynchronous delivery of the command requests from the user interface to the back end processing service. This pattern is appropriate for the situations where user action may take long time to complete and user may not be made to wait that long. It is also an effective solution for the situations where the time of action depends on another service that might not be always obtainable. Since the cloud native applications could be highly distributed and have back end processes that they may need to connected with, this pattern is very useful. It successfully decouples the application tiers and ensures the successful delivery of the messages that is basic for many applications dealing with financial transaction. Websites dealing with media and file uploads; batch processes, approval workflows etc. are some of the applicable scenarios.
Since the queue based approach offloads part of the processing to the queue infrastructure that can be provisioned and scaled separately, it assists in optimizing the computing resources and managing the infrastructure.
Although Queue Centric Workflow pattern has may benefits, it poses its challenges that should be considered beforehand for its effective implementation.
Queues are supposed to ensure that the messages received are processed successfully at the minimum for once. For this reason the messages are not deleted permanently until the request is processes successfully and can be made obtainable repeatedly after a failed attempt. Since the message can be picked up multiple times and from the multiple nodes, keeping the business course of action idempotent (where multiple processes don’t alter the final consequence) could be a tricky task. This only gets complicated in the cloud environments where processes might be long running, span across service nodes and could have multiple or multiple types of data stores.
Another issue that the queue poses is of the poison messages. These are the messages that can’t get processes due to some problem (e.g., an email address too long or having invalid characters) and keep on reappearing in the queue. Some queues provide a dead letter queue where such messages are routed for further examination. The implementation should consider the poison message scenarios and how to deal with them.
Since the inherent asynchronous processing character of the queues, applications implementing it need to find out ways to notify the user, about the position and completion of the initiated responsibilities. There are long polling mechanisms obtainable for requesting the back end service about the position in addition.
Microsoft Azure provides two mechanisms for implementing asynchronous processing- Queues and Service Bus. Queues allow communicating two applications using simple method- one application puts the message in the queue and another application picks it up. Service Bus provides a publish-and-subscribe mechanism. An application can send messages to a topic, while other applications can create subscriptions to this topic. This allows one-to-many communication among a set of applications, letting the same message be read by multiple recipients. Service Bus also allows direct communication by its relay service, providing a obtain way to interact by firewalls. observe that Azure charges for each de-queuing request already if there are no messages waiting, so necessary care should be taken to reduce the number of such unnecessary requests.
Auto Scaling maximizes the benefits from the Horizontal Scaling. Cloud platforms provide on need availability, scaling and termination of the resources. They also provide mechanism for gathering the signals of resource utilization and automated management of resources. Auto scaling leverages these capabilities and manages the cloud resources (adding more when more resources are required, releasing existing when it is no more required) without manual intervention. In the cloud, this pattern is often applied with the horizontal scaling pattern. Automating the scaling not only makes it effective and error free but the optimized use cuts down the cost in addition.
Since the horizontal scaling can be applied to the application layers individually, the auto scaling has to be applied to them separately. Known events (e.g., overnight reconciliation, quarterly processing of the vicinity wise data) and environmental signals (e.g., surging number of concurrent users, consistently picking up site hits) are the two dominant supplies that could be used to set the auto scaling rules. except that rules could be constructed based on inputs like the CPU usages, obtainable memory or length of the queue. More complicate rules can be built based on analytical data gathered by the application like average course of action time for an online form.
Cloud service providers have certain rules for billing in the instances based on clock hours. Also the SLAs they provide may need a minimum number of resources active all the time. See that implementing the auto scaling too actively doesn’t ends up being costly or puts the business out of the SLA rules. The auto-extent characterize includes alerts and notifications that should be set and used wisely. Also the auto-scaling can be enabled or disabled on need if there is a need.
The cloud platforms provide APIs and allow building auto scaling into the application or creating a custom tailor auto scaling solution. Both the Azure and AWS provide auto-scaling solutions and are supposed to be more effective. They come with a price tag though. There are some third party products in addition that permit auto-scaling.
Azure provides a software part named as Windows Azure Auto-scaling Application Block (WASABi for short) that the cloud native applications can leverage for implementing auto scaling.
BUSY SIGNAL PATTERN
The cloud sets (e.g., the data service or management service) requests may experience a transient failure when very busy. Similarly the sets that reside outside of the application, within or outside of the cloud, may fail to respond to the service request closest at times. Often the timespan that the service would be busy would be very short and just another request might be successful. Given that the cloud applications are highly distributed and connected to such sets, a premeditated strategy for handling such busy signals is very important for the reliability of the application. In the cloud ecosystem such permanent failures are normal behavior and these issues are hard to be diagnosed, so it makes already more sense to think by it in improvement.
There could be many possible reasons for such failures (an uncommon spike in the load, a hardware failure etc.). Depending upon the circumstances the applications can take many approaches to manager the busy signals: retry closest, retry after a delay, retry with increasing delay, retry with increasing delay with fixed increments (linear backoff) or with exponential increments (exponential backoff). The applications should also decide its approach when to stop further attempts and throw an exception. Besides that the approach could vary depending upon the kind of the application, whether it is handling the user interactions directly, is a service or a backend batch course of action and so on.
Azure provides client libraries for most of its sets that allow programming the retry behavior into the applications accessing those sets. They provide easy implementation of the default behavior and also allow building customization. A library known as the Transient Fault Handling Application Block, also known as Topaz is obtainable from Microsoft.
The nodes can fail due to various reasons like hardware failure, unresponsive application, auto scaling etc. Since these events are shared for the cloud scenarios, applications need to ensure that they manager them proactively. Since the applications might be running on multiple nodes simultaneously they should be obtainable already when an individual node experiences shutdown. Some of the failure scenarios may send signals in improvement but others might not, and similarly different failure scenarios may or mayn’t be able to retain the data saved locally. Deploying an additional node than required (N+1 Deployment), catching and processing platform generated signals when obtainable (both Azure and AWS send alerts for some of the node failures), building strong exception handling mechanism into the applications, storing the application and user storage with the reliable storage, avoiding sticky sessions, fine-tuning the long running processes are some of the best practices that will assist handling the node failures gracefully.
MULTI SITE DEPLOYMENT
Applications might need to be deployed across datacenters to implement failover across them. It also improves availability by reducing the network latency as the requests can be routed to the nearest possible datacenter. At times there might be specific reasons for the multi-site deployments like government regulations, unavoidable integration with the private datacenter, extremely high availability and data safety related requirements. observe that there could be equally valid reasons that will not allow the multisite deployments, e.g. government regulations that forbid storing business sensitive or private information outside the country. Due to the cost and complexity related factors such deployments should be considered properly before the implementation.
Multi-site deployments call for two important activities: directing the users to the nearest possible datacenter and replicating the data across the data stores if the data needs to be the same. And both of these activities average additional cost.
Multisite deployments are complicated but the cloud sets provide networking and data related sets for geographic load balancing, cross-data center failover, database synchronization and geo-replication of cloud storage. Both Azure and Amazon Web sets have multiple datacenters across the globe. Windows Azure Traffic Manager and Elastic Load Balancing from Amazon Web sets allow configuring their sets for geographical load balancing.
observe that the sets for the geographical load-balancing and data synchronization may not be 100% resilient to all the types of failovers. The service description must be equaled with the requirements to know the possible risks and mitigation strategies.
Cloud is a world of possibilities. There are a lot many other patterns that are very pertinent to the cloud specific architecture. Taking it already further, in real life business scenarios, more than one of these patterns will need to get implemented together for making it work. Some of the cloud crucial aspects that are important for the architects are: multi-tenancy, maintaining the consistency of the database transactions, separation of the commands and queries etc. In a way each business scenario is rare and so it needs specific treatment. Cloud being the platform for the innovations, the well-established architecture patterns too may be implemented in novel ways there, solving these specific business problems.
Cloud is a complicate and evolving ecosystem that fosters innovation. Architecture is important for an application, and more important for the cloud based applications. Cloud based solutions are expected to be flexible to change, extent on need and minimize the cost. Cloud offerings provide the necessary infrastructure, sets and other building blocks that must be put together in the right way to provide the maximum Return on Investment (ROI). Since majority of the cloud applications could be distributed and spread over the cloud sets, finding and implementing the right architecture patterns is very important for the success.