AUTOSCALING WEBSERVICES ON AMAZON EC2 M. A. AZEEZ This dissertation was submitted to the Department of Computer Science and Engineering of the University of Moratuwa in partial fulfillment of the requirements for the Degree of M Sc in Computer Science specializing in Software Architecture Department of Computer Science & Engineering University of Moratuwa, Sri Lanka February 2010 96421 ABSTRACT Fault tolerance, high availability, & scalability are essential prerequisites for any Enterprise application deployment. One of the major concerns of enterprise Application architects is avoiding single points of failure. There is a high cost associated with achieving high availability & scalability. We will look at an economical approach towards automatically scaling Web service applications whilemaintainingtheavailability&scalabilityguaranteesatanoptimumeconomicalcostThisapproach,inv olving the Amazon EC2 cloud computing infrastructure, makes it unnecessary to invest in safety- net capacity & unnecessary redundancy. The Web service application developer should only need to write the application once, and simply deploy it on the cloud. The scalability & availability guarantees should be provided automatically by the underlying infrastructure. Auto scaling refers to the behavior where the system scales up when the load increases & scales down when the load decreases. Auto-healing refers to an approach where a specified minimum deployment configuration is maintained even in the event of failures. Such an approach is essential for cloud deployments such as Amazon EC2 where the charge is based on the actual computing power consumed. Ideally, from the clients' point of view, in an auto scaling system, the response time should be constant and the overall throughput of the system should increase. We will describe in detail an economical approach towards building auto-scaling Apache Axis2 Web services on Amazon EC2. In the course of this article, we will introduce well-known address (WKA) based membership discovery for clustering deployments where multicast-based membership discovery is an impossibility. We will also introduce an approach towards dynamic load balancing, where the load balancer itself uses group communication & group membership mechanisms to discover the domains across which the load is distributed. In a traditional setup, a single load balancer fronts a group of application nodes. In such a scenario, the load balancer can be a single point of failure. Traditionally, techniques such as Linux HA have been used to overcome this. However, such traditional schemes have quite a bit of overhead and also require the backup system to be in close proximity to the primary system. In case of catastrophic situations, this approach can result in complete failure of the system. We will introduce an auto healing scheme in case of load balancer failure using Amazon Elastic fP addresses & a load balancer group, which can overcome these shortcomings. - Declaration The work included in this report was done by me, and only by me, and the work has not been submitted for any other academic qualification at any institution. ,I)(/ ,_!:t ~ 1...o I D ············~·······H · ········· LJ Afkham Azeez Date I certify that the declaration above by the candidate is true to the best of my knowledge and that this report is acceptable for evaluation for the CS6999 M.Sc. Research Project. .., (r.......b -~ :X\-> ................................................ ). ........ . Date - I - ACKNOWLEDGMENTS I would like to express profound gratitude to my advisor, Dr. Sanjiva Weerawarana, for his invaluable support, encouragement, supervision and useful suggestions throughout this research work. His continuous guidance enabled me to complete my work successfully. I am grateful for the support & assistance provided by the management of WS02, who provided me the research facilities. This work may not have been possible without the Amazon Web services R&D account that was provided by WS02. This account was extensively used during this research. This work would not have been possible without the support & assistance I received from Filip Hanik, author of the Apache Tribes group communication framework, which is extensively used in my work. I would also like to thank Asankha Perera, software architect at WS02 & a lead architect of Apache Synapse, who provided advice on architecture & design. I would also like to thank Ruwan Linton, a lead developer of Apache Synapse, and Paul Fremantle, project lead of Apache Synapse, who provided me with design ideas related to dynamic load balancing & load analysis. I would also like to thank my colleague & good friend Amila Suriarachchi for reviewing my work and providing valuable feedback & suggestions for improvements. I would like to thank Chinthana Wilamuna, the scripting languages expert at WS02, for providing his expert advice on various aspects, as well as Deependra Ariyadewa & Chamith Kumarage, Linux experts, for pmviding their expertise. I am grateful to the open source communities backing the Apache Axis2 & Apache Synapse projects. Their contributions were the foundation upon which this work was built. I am as ever, especially indebted to my parents for their love and support throughout my life. I also wish to thank my wife, who supported me throughout my work. Finally, I wish to express my gratitude to all my colleagues at WS02. Many ideas related to this project have come to my mind when having technical & even non- technical discussions with this group of intellectuals. - Ill - TABLE OF CONTENTS ABSTRACT .............................................. ...... ......................................... ........... ............................. ii ACKNOWLEDGMENTS ............................... .......................... ............. ........................................ iii TABLE OF CONTENTS .................................................. .............................................................. iv LIST OF ABBREVIATIONS ....................................................................................................... viii Chapter I Introduction ..................................................................................................................... I 1. 1 Amazon Elastic Compute Cloud (EC2) ................................................................................ 3 1.2 EC2 Features for Buildmg Fatlure Resihent Applications .................................................. .4 1.3 Apache Axis2 ........... ............................................................................................................. 5 1.4 Apache Synapse Dynamic Load Balancing .......................................................................... 6 1.5 The Probletn .......................................................................................................................... 6 1.6 Objectives ..................................... .. ...................................................................................... 7 1.7 Prior Work ............................................................................................................................ 8 Chapter 2 Literature Review .......................... ... ......... ................ .......... .......................................... I 0 2.1 Terminology ........................ ............................ .. ......... ..... .. .. .. ........... ......... ... ... .. ................. I I 2.2 Theoretical Aspects ....................... ............ ... ......... ................. ... .......................................... 12 2.2. 1 Failure Detection ........................................................................................................ 12 2.2.2 Overcoming Distributed System Failures .................. ................................................. 14 2.2.3 High Availability ........................................................................................................ 15 2.2.4 Group Membershtp Service(GMS) ............................................................................ 16 2.2.5 Data Replication Models .................. ':': ........................................................................ 16 2.2.6 Virtual Synchrony ....................................................................................................... 17 2.2.7 Membership ................................................................................................................ l8 2.2.8 Load Balancing ........................................................................................................... 18 2.2.9 Reliabilit} & Web services ......................................................................................... 19 2.2. 1 0 Group Communication Frameworks (GCF) ............................................................. 20 2.3 Atnazon EC2 ....................................... ............................ ...... .............................................. 20 2.3 .1 Using EC2 ................................................. ... ............ ... ........................ ....................... 20 2.3.2 Instance Types ......................................... ..... .. ............... ......... .... ................................ 21 2.3.3 Features for Buildi ng Failure Resilient Applications .... ............................................. 21 2.3.4 Data Transfer Charges ............................................... .. ............................................... 22 2.4 Giga Spaces .............. .......................................................................................................... 23 Chapter 3 Methodology ................................................................................................................. 24 3.1 Introduction ......................................................................................................................... 25 3.2 Well-known Address (WKA) based membership .............................................................. 26 3.3 Faull Resilient Dynamtc Load Balancing ........................................................................... 28 3.4 Dynamic Load Balancing with Load Analysis ................................................................... 34 - I\ - 3.5 Failure Detection ................................................................................................................. 35 3.6 Deployment Architecture .................................................................................................... 36 3.7 Apache Synapse Autoscaling Load Balancer ..................................................................... 40 3.7.1 Normal Message Flow ................................................................................................ 40 3.7.2 Error Message Flow .................................................................................................... 42 3.8 Load Analyzer Task ................................. , .......................................................................... 44 3.8.1 Load Analysis Task Configuratton ............................................................................. 44 3.9 Load Analysis Algorithm .................................................................................................... 46 3.9.1 Making the Scale-Up Deeision ................................................................................... 47 3.9.2 Making the Scale-Down Decision .............................................................................. 4 7 3.10 EC2 Client Library ............................................................................................................ 48 3.11 Handling Primary Load Balancer Failure ......................................................................... 48 3.12 Axis2 Application Cluster ................................................................................................ 48 3.13 Deployment on EC2 .......................................................................................................... 50 3. 13.1 Starting up an Axis2 Instance ................................................................................... 52 3.13.2 Starting up a Synapse Load Balancer Instance ......................................................... 53 3.13.3 Auto-starting & Auto-healing a Cluster ................................................................... 54 3.14 Failure Scenarios ............................................................................................................... 55 3. 14.1 Load Balancer Failure .......................................................................... , ................... 55 3.14.2 Axis2 Application Process Failures .......................................................................... 55 Chapter 4 Observations, Results & Analysis ................................................................................. 57 4.1 Performance Testing Methodology ................... : ................................................................ 58 4.1.1 Scenario l No autoscaling ....................................................................................... 59 4.1.2 Scenario 2 Autoscaling Enabled .............................................................................. 60 4.2 Test Results ......................................................................................................................... 61 4.3 Analysis of Test Results .......................... ~ .......................................................................... 62 Chapter 5 Conclusion ..................................................................................................................... 64 5.1 Future \Vork ........................................................................................................................ 67 REFERENCES ......................................................................................................................... 68 - v- LIST OF FIGURES Figure I: Scale-up when the system load increases ................................................................... 7 Figure 2: Scale-down when the system load decreases ............................................................. 8 Figure 3: Passive Replication ................................................................................................... 16 Figure 4: Multicast from client ................................................................................................ 17 Figure 5: Replies from replicas ................................................................................................ 17 Figure 6: Me1nber joins group ................................................................................................. 27 Figure 7: Well-known member rejoins aOer crashing ............................................................. 27 Figure 8: Load balancer & application groups ........................................................................ 29 Figure 9: Active-passive load balancers with Elastic IP .......................................................... 29 Figure I 0: Membership channel architecture ........................................................................... 30 Figure II: Initialization channel architecture .......................................................................... 31 Figure 12: Application member joins. A load balancer is also a well-known member ........... 32 Figure 13: A non-WK load balancer joins ............................................................................... 33 Figure 14: A well-known load balancer rejoins after crashing ................................................ 34 Figure 15: Deployment on EC2 ............................................................................................... 36 Figure 16: Normal message now ............................................................................................. 40 Figure 17: Error now ............................................................................................................... 42 Figure 18: synapse.xml ............................................................................................................ 43 .., Figure I 9: LoadAnaly.lerTask configuration in synapse.xml .................................................. 45 Figure 20: axis2.xml in application node ................................................................................. 49 Figure 21: autoscale-init.sh script ............................................................................................ 51 Figure 22: Axis2 instance bootup ............................................................................................ 52 Figure 23: Synapse load balancer instance startup .................................................................. 53 Figure 24: Bootstrapping the system ....................................................................................... 54 Figure 25: Pcrfonnance Testing Scenario I. Non-autoscaling single worker instance ............ 60 Figure 26: Perfonnance Testing Scenario 2. Autoscaling system ........................................... 61 Figure 27: Response Time Variation ...................................... : ................................................ 61 Figure 28: Throughput Variation ............................................................................................. 62 - vi - LIST OF TABLES Table I: EC2 Instance Types ................................................................. .... .... .......................... 21 Table 2: Load.AnalyzerTask configuration parameters ............................................................ 45 ... - \II - LIST OF TABLES Table I: EC2 Instance Types ................................................................................................... 21 Table 2: LoadAnalyzerTask configuration parameters ............................................................ 45 .., - \II - LIST OF ABBREVIATIONS AMI Amazon Machine Image AWS Amazon Web Services CPU Central Processing Unit CORBA Common Object Request Broker Architecture EC2 (Amazon) Elastic Compute Cloud ESB Enterprise Service Bus NM Java Virtual Machine GCF Group Communication Framework GMP Group Membership Protocol GMS Group Membership Service HA High Availability HTIP Hypertext Transfer Protocol SLA Service Level Agreement SOAP llistorically, Simple Object Access P!lltocol. Now simply SOAP S3 (Amazon) Simple Storage Scr. icc WKA Well-known Address, Well-kno""n Addressing WSDL Web Services Description Language . - VIII -