Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Mastering Google App Engine
  • Toc
  • feedback
Mastering Google App Engine

Mastering Google App Engine

3 (2)
close
Mastering Google App Engine

Mastering Google App Engine

3 (2)

Overview of this book

Developing web applications that serve millions of users is no easy task, as it involves a number of configurations and administrative tasks for the underlying software and hardware stack. This whole configuration requires not only expertise, but also a fair amount of time as well. Time that could have been spent on actual application functionality. Google App Engine allows you develop highly scalable web applications or backends for mobile applications without worrying about the system administration plumbing or hardware provisioning issues. Just focus writing on your business logic, the meat of the application, and let Google's powerful infrastructure scale it to thousands of requests per second and millions of users without any effort on your part. This book takes you from explaining how scalable applications work to designing and developing robust scalable web applications of your own, utilizing services available on Google App Engine. Starting with a walkthrough of scalability is and how scalable web applications work, this book introduces you to the environment under which your applications exist on Google App Engine. Next, you will learn about Google's datastore, which is a massively scalable distributed NoSQL solution built on top of BigTable. You will examine the BigTable concepts and operations in detail and reveal how it is used to build Google datastore. Armed with this knowledge, you will then advance towards how to best model your data and query that along with transactions. To augment the powerful distributed dataset, you will deep dive into search functionality offered on Google App Engine. With the search and storage sorted out, you will get a look into performing long running tasks in the background using Google App Engine task queues along with sending and receiving emails. You will also examine the memcache to boost web application performance, image processing for common image manipulation tasks. You will then explore uploading, storing, and serving large files using Blobstore and Cloud storage. Finally, you will be presented with the deployment and monitoring of your applications in production along with a detailed look at dividing applications into different working modules.
Table of Contents (12 chapters)
close
11
Index

The overall architecture

The scaling of a web application is a hard thing to do. Serving a single page to a single user is a simple matter. Serving thousands of pages to a single or a handful of users is a simple matter, too. However, delivering just a single page to tens of thousands of users is a complex task. To better understand how Google App Engine deals with the problem of scale, we will revisit the whole problem of scaling in next chapter's, how it has been solved till date and the technologies/techniques that are at work behind the scenes. Once armed with this understanding, we will talk about how Google App Engine actually works.

The challenge of scale

The whole problem of complexity arises from the fact that to serve a simple page, a certain amount of time is taken by the machine that hosts the page. This time usually falls in milliseconds, and eventually, there's a limit to the number of pages that can be rendered and served in a second. For instance, if it takes 10 milliseconds to render a page on a 1 GHz machine, this means that in one second, we can serve 100 pages, which means that at a time, roughly 100 users can be served in a second.

However, if there are 300 users per second, we're out of luck as we will only be able to serve the first 100 lucky users. The rest will get time-out errors, and they may perceive that our web page is not responding, as a rotating wait icon will appear on the browser, which will indicate that the page is loading.

Let's introduce a term here. Instead of pages per second, we will call it requests or queries per second, or simply Queries Per Second (QPS), because users pointing the browser to our page is just a request for the page.

How to scale with the scale?

We have two options here. The first option is to bring the rendering time down from 10 milliseconds to 5 milliseconds, which will effectively help us serve double the number of users. This path is called optimization. It has many techniques, which involve minimizing disk reads, caching computations instead of doing on the fly, and all that varies from application to application. Once you've applied all possible optimizations and achieved a newer and better page rendering time, further reduction won't be possible, because there's always a limit to how much we can optimize things and there always will be some overhead. Nothing comes for free.

The other way of scaling things up will be to put more hardware. So, instead of a 1 GHz machine, we can put a 2 GHz machine. Thus, we effectively doubled the number of requests that are processed from 100 to 200 QPS. So now, we can serve 200 users in a second. This method of scaling is called vertical scaling. However, yet again, vertical scaling has its limits because you can put a 3 GHz processor, then a 3.5 GHz one, or maybe clock it to a 4.8 GHz one, but finally, the clock frequency has some physical limits that are imposed by how the universe is constructed, and we'll hit the wall sooner or later. The other way around is that instead of putting a single 1 GHz machine, we can put two such machines and a third one in front. Now, when a request comes to the third front-end machine, we can distribute it to either of the other two machines in an alternate fashion, or to the machine with the least load. This request distribution can have many strategies. It can be as simple as a random selection between the two machines, or round-robin fashion one after the other or delegating request to the least loaded machine or we may even factor in the past response times of the machines. The main idea and beauty of the whole scheme is that we are no more limited by the limitations of the hardware. If a 1 GHz machine serves 100 users, we can put 10 such machines to serve 1000 users. To serve an audience of 1 million users, we will need ten thousand machines. This is exactly how Google, Facebook, Twitter, and Amazon handle tens of millions of users. The image shows the process of load balancer:

How to scale with the scale?

Load balancer splitting the load among machines.

A critical and enabling component here is the machine at front called load balancer. This machine runs the software that receives requests and delegates them to the other machines. Many web servers such as Ngnix and Apache come with load-balancing capabilities and require configurations for activating load balancing. The HAProxy is another open source load balancer that has many algorithms at its disposal, which are used to distribute load among the available servers.

A very important aspect of this scaling magic is that each machine, when added to the network, must respond in a manner that is consistent with the responses of the other machines of the cluster. Otherwise, users will have an inconsistent experience, that is, they might see something different when routed to one machine and something else when routed to another machine. For this to happen, even if the operating system differs (consider an instance where the first machine runs on Ubuntu with Cpython and the second one runs on CentOS with Jython), the output produced by each node should be exactly the same. In order to keep things simple, each machine usually has an exactly identical OS, set of libraries, and configurations.

Scaling in practice

Now that you have a load balancer and two servers and you're able to ramp up about 200 QPS (200 users per second), what happens when your user base grows to about 500 people? Well, it's simple. You have to repeat the following process:

  1. Go to a store and purchase three more machines.
  2. Put them on racks and plug in the network and power cables.
  3. Install an OS on them.
  4. Install the required languages/runtimes such as Ruby or Python.
  5. Install libraries and frameworks, such as Rails or Django.
  6. Install components such as web servers and databases.
  7. Configure all of software.
  8. Finally, add the address of the new machines to the load balancer configuration so that it can start delegating requests from users to machines as well.

You have to repeat the same process for all the three machines that you purchased from the store.

So, in this way, we scaled up our application, but how much time did it take us to do that all? The setting up of the server cables took about 10 minutes, the OS installation another 15 minutes, and the installation of the software components consumed about 40 minutes. So approximately, it took about 1 hour and 5 minutes to add a single node to the machine. Add the three nodes yourself, this amounts to about 4 hours and 15 minutes, that too if you're efficient enough and don't make a mistake along the way, which may make you go back and trace what went wrong and redo the things. Moreover, the sudden spike of users may be long gone by then, as they may feel frustrated by a slow or an unresponsive website. This may leave your newly installed machines idle.

Infrastructure as a Service

This clunky game of scaling was disrupted by another technology called virtualization, which lets us emulate a virtual machine on top of an operating system. Now that you have a virtual machine, you can install another operating system on this virtual machine. You can have more than one virtual machine on a single physical machine if your hardware is powerful enough, which usually is the case with server-grade machines. So now, instead of wiring a physical machine and installing the required OS, libraries, and so on, you can simply spin a virtual machine from a binary image that contains an OS and all the required libraries, tools, software components, and even your application code, if you want. Spinning such a machine requires few minutes (usually about 40 to 150 seconds). So, this is a great time-saving technique, as it cuts down the time requirement from one and a half hour to a few minutes.

Virtualization has created a multibillion-dollar industry. It is a whole new cool term that is related to Cloud computing for consultants of all sorts, and it is used to furnish their resumes. The idea is to put hundreds of servers on racks with virtualization enabled, let the users spin the virtual machines of their desired specs and charge them based on the usage. This is called Infrastructure as a Service (IaaS). Amazon, Racksapce, and Digital Ocean are the prime examples of such models.

Platform as a Service

Although Infrastructure as a Service gives a huge boost in building scalable applications, it still leaves a lot of room for improvements because you have to take care of the OS, required libraries, tools, security updates, the load balancing and provisioning of new machine instances, and almost everything in between. This limitation or problem leads to another solution called Platform as a Service (Paas), where right from the operating system to the required runtime, libraries and tools are preinstalled and configured for you. All that you have to do is push your code, and it will start serving right away. Google App Engine is such a platform where everything else is taken care of and all that you have to worry about is your code and what your app is supposed to do.

However, there's another major difference between IaaS and PaaS. Let's see what the difference is.

Containers

We talked about scaling by adding new machines to our hosting fleet that was done by putting up new machines on the rack, plugging in the wires, and installing the required software, which was tedious and very time-consuming and took up hours. We then spoke about how virtualization changed the game. You can instantiate a whole new (virtual) machine in a few minutes, possibly from an existing disk image, so that you don't have to install anything. This is indeed a real game changer.

However, the machine is slow at the Internet scale. You may have a sudden increase in the traffic and you might not be able to afford waiting for a few minutes to boot new instances. There's a faster way that comes from a few special features in the Linux kernel, where each executing process can have its own allocated and dedicated resources. What this abstract term means is that each process gets its own partition of the file systems, CPU, and memory share. This process is completely isolated from the other processes. Hence, it is executed in an isolated container. Then, for all practical purposes, this containment actually works as a virtual machine. An overhead of creating such an environment merely requires spinning a new process, which is not a matter of minutes but of a few seconds.

Google App Engine uses containment technology instead of virtualization to scale up the things. Hence, it is able to respond much faster than any IaaS solution, where they have to load a whole new virtual machine and then the whole separate operating system on top of an existing operating system along with the required libraries.

The containers use a totally different approach towards virtualization. Instead of emulating the whole hardware layer and then running an operating system on top of it, they actually are able to provide each running process a totally different view of the system in terms of file system, memory, network, and CPU. This is mainly enabled by cgroups (short for control groups). A kernel feature was developed by the engineers at Google in 2006 and later, it was merged into Linux kernel 2.6.24, which allows us to define an isolated environment and perform resource accounting for processes.

A container is just a separation of resources, such as file system, memory, and other resources. This is somewhat similar to chroot on Linux/Unix systems which changes the apparent root directory for the current running process and all of its parent-child. If you're familiar with it, you can change the system that you're working on, or simply put, you can replace the hard drive of your laptop with a hard drive from another laptop with identical hardware but a different operating system and set of programs. Hence, the mechanism helps to run totally different applications in each container. So, one container might be running LAMP stack and another might be running node.js on the same machine that runs at bare metal at native speed with no overhead.

This is called operating system virtualization and it's a vast subject in itself. Much more has been built on top of cgroups, such as Linux Containers (LXC) and Docker on top of LXC or using libvirt, but recently, docker has its own library called libcontainer, which sits directly on top of cgroups. However, the key idea is process containment, which results in a major reduction of time. Eventually, you will be able to spin a new virtual machine in a few seconds, as it is just about launching another ordinary Linux process, although contained in terms of what and how it sees the underlying system.

A comparison of virtual machines versus application containers (App Engine instances in our case) can be seen in the following diagram:

Containers

Virtualization vs container based App Engine machine instances.

How does App Engine scales?

Now that we understand many of the basic concepts behind how web applications can be scaled and the technologies that are at work, we can now examine how App Engine scales itself. When a user navigates to your app using their browser, the first thing that receives the users are the Google front end servers. These servers determine whether it is a request for App Engine (mainly by examining the HTTP Host header), and if it is, they are handed over to the App Engine server.

The App Engine server first determines whether this is a request for a static resource, and if that's the case, it is handed over to the static file servers, and the whole process ends here. Your application code never gets executed if a static resource is requested such as a JavaScript file or a CSS stylesheet. The following image shows the cycle of Google App Engine server request process:

How does App Engine scales?

Google App Engine Journey of a request.

However, in case the request is dynamic, the App Engine server assigns it a unique identifier based on the time of receiving it. It is entered into a request queue, where it shall wait till an instance is available to serve it, as waiting might be cheaper then spinning a new instance altogether. As we talked about in the section on containers, these instances are actually containers and just isolated processes. So eventually, it is not as costly as launching a new virtual machine altogether. There are a few parameters here that you can tweak, which are accessible from the application performance settings once you've deployed. One is the minimum latency. It is the minimum amount of time a request should wait in the queue If you set this value to a higher number, you'll be able to serve more requests with fewer instances but at the cost of more latency, as perceived by the end user. App Engine will wait till the time that is specified as minimum latency and then, it will hand over the request to an existing instance. The other parameter is maximum latency, which specifies the maximum time for which a request can be held in the request queue, after which, App Engine will spin a new instance if none is available and pass the request to it. If this value is too low, App Engine will spin more instances, which will result in an increase in cost but much less latency, as experienced by the end user.

However by default, if you haven't tweaked the default settings. (we'll see how to do this in the Chapter 10, Application Deployment) Google App Engine will use heuristics to determine whether it should spin a new instance based on your past request history and patterns.

How does App Engine scales?

App Engine: Request, Request queues and Instances.

The last but a very important component in the whole scheme of things is the App Engine master. This is responsible for updates, deployments, and the versioning of the app. This is the component that pushes static resources to static servers and code to application instances when you deploy an application to App Engine.

bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete