AwsTechNix: January 2017

Monday, 23 January 2017

Deployment automation using AWS Code Depoly

Codedeploy is one of the deployment service by AWS. The application can be deployed using either a s3 bucket or a git repository which contains the deployable content like code, scripts, configurations files, executables etc.

In this blog post, we are going to deploy a wordpress application in an elastic, highly available and scalable environment using codedeploy.

Get things ready

Get a copy of the WordPress source code in the local system using git command:

git clone https://github.com/WordPress/WordPress.git /tmp/WordPress

Create Scripts to run your Application. Make a directory .scripts in the WordPress folder:

mkdir -p /tmp/WordPress/.scripts

Create the following shell scripts in the .scripts folder: sudo vim install_dependencies.sh:

#!/bin/bash
yum groupinstall -y "PHP Support"  
yum install -y php-mysql  
yum install -y nginx  
yum install -y php-fpm

Next sudo vim stop_server.sh:

#!/bin/bash
isExistApp=`pgrep nginx`  
if [[ -n  \$isExistApp ]]; then  
   service nginx stop
fi  
isExistApp=`pgrep php-fpm`  
if [[ -n  \$isExistApp ]]; then  
    service php-fpm stop
fi

one more, sudo vim start_server.sh:

#!/bin/bash
service nginx start  
service php-fpm start

and finally, sudo vim change_permissions.sh:

#!/bin/bash
chmod -R 755 /var/www/WordPress

Make these scripts executable with this command:

chmod +x /tmp/WordPress/.scripts/*

CodeDeploy uses an AppSpec file which is a unique file that defines the deployment actions you want CodeDeploy to execute. So along with the above scripts, create a appspec.yml file
sudo vim appspec.yml

version: 0.0  
os: linux  
files:  
  - source: /
    destination: /var/www/WordPress
hooks:  
  BeforeInstall:
    - location: .scripts/install_dependencies.sh
      timeout: 300
      runas: root
  AfterInstall:
    - location: .scripts/change_permissions.sh
      timeout: 300
      runas: root
  ApplicationStart:
    - location: .scripts/start_server.sh
      timeout: 300
      runas: root
  ApplicationStop:
    - location: .scripts/stop_server.sh
      timeout: 300
      runas: root

Now zip the WordPress folder and push it to your git repository.

Creating IAM Roles

Create an iam instance profile and attach AmazonEC2FullAccess policy and also attach the following inline policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Create a service role CodeDeployServiceRole. Select Role type AWS CodeDeploy. Attach the Policy AWSCodeDeployRole as shown in the below screenshots:

How about Scale?

Create an autoscaling group for a scalable environment. Steps below:

Choose an ami and select an instance type for it:

Attach the iam instance profile which we created in the earlier step:

Now go to Advanced Settings and type the following commands in “User Data” field to install codedeploy agent on your machine (if it’s not already installed on your ami):

#!/bin/bash
yum -y update  
yum install -y ruby  
yum install -y aws-cli  
sudo su -  
aws s3 cp s3://bucket-name/latest/install . --region region-name  
chmod +x ./install  
./install auto

where, bucket-name represents one of the following based on the instances in the specified region:

aws-codedeploy-us-east-1
aws-codedeploy-us-west-2
aws-codedeploy-us-west-1
aws-codedeploy-eu-west-1
aws-codedeploy-eu-central-1
aws-codedeploy-ap-southeast-1
aws-codedeploy-ap-southeast-2
aws-codedeploy-ap-northeast-1
aws-codedeploy-ap-south-1
aws-codedeploy-eu-west-2
aws-codedeploy-ca-central-1
aws-codedeploy-us-east-2
aws-codedeploy-ap-northeast-2
aws-codedeploy-sa-east-1

and region-name will be one of the following:

us-east-1
us-west-2
us-west-1
eu-west-1
eu-central-1
ap-southeast-1
ap-southeast-2
ap-northeast-1
ap-south-1
eu-west-2
ca-central-1
us-east-2
ap-northeast-2
sa-east-1

Select Security Group in the next step and create the launch configuration for the autoscaling group. Now using the launch configuration created in the above step, create an Autoscaling group.

Select the launch configuration from the given options:

Give the name of the group in the next screen and select a subnet for it.

Keep the remaining settings at its default and create the autoscaling group.

Time to Deploy

Choose Create New Application. Give some name for the application and a name for the deployment group as well.

Select Autoscaling Group in Search By Tags field to deploy the application on the group and select CodeDeployDefault.OneAtATime in the Deployment Config field.

Select ServiceRoleARN for the service role which we created in the “Creating IAM Roles” section of this post. Go to Deployments and choose Create New Deployment. Select Application and Deployment Group and select the revision type for your source code (i.e. an S3 bucket or a GitHub repository).

On the successful deployment of the application, something like this will appear on the screen:

The WordPress is now deployed on the AutoScaling Group. So when you hit the public IP of the instance which belongs to the autoscaling group, nginx test page will load.

Configurring WordPress

Since nginx needs php-fpm to work with php pages, we need to configure php-fpm. Also we need to configure WordPress script as well. For this we need to do certain changes in the files as shown below:

sudo vim /etc/php.ini

Uncomment cgi.fix_pathinfo=0 and change the value from 1 to 0.

sudo vim /etc/php-fpm.d/www.conf

Change user=nginx and group=nginx and also make sure the following values are uncommented:

  pm.min_spare_servers = 5
  pm.max_spare_servers = 35

Add this following script to the configuration file sudo vim /etc/nginx/conf.d/virtual.conf

server {  
listen 80;  
server_name example.com;  
location / {  
    root /var/www/WordPress;
    index index.php index.html index.htm; 
    if (-f $request_filename) {
    expires 30d;
    break;
    }
    if (!-e $request_filename) {
    rewrite ^(.+)$ /index.php?q=$1 last;
    }
    } 
location ~ .php$ {  
    fastcgi_pass   localhost:9000;  #port where FastCGI processes were spawned
    fastcgi_index  index.php;
    fastcgi_param SCRIPT_FILENAME   
    /var/www/WordPress$fastcgi_script_name; #same path as above
fastcgi_param PATH_INFO $fastcgi_script_name;  
include /etc/nginx/fastcgi_params;  
}
}

Hit the server name on the browser and It will load the WordPress Application.To avoid this manual work of configuring the application for other instances in the AutoScaling Group, you can create an image of the instance in which you have done these changes and provide the ami of the created image to the Launch Configuration and update the Launch Configuration in the AutoScaling Group. Hence, the new instances will be created with the updated image.

After the successful installation, the wordpress dashboard will appear as shown in the below screenshot:

Make It Stateless

If you would like to scale at will and deploy at will, you need to make sure that the web/app is stateless. Make sure that you manage plugins in github repo and static content is stored outside the server, on S3.

To store the static media content of your WordPress Application in an S3 Bucket, we will need a plugin named WP Offload S3.
This plugin automatically copies the media files uploaded by WordPress into an S3 bucket. But this plugin has a dependency on another plugin, Amazon Web Services

So, after downloading the both plugins, we got the two zip files of these plugins now. Unzip these files to WordPress/wp-content/plugins path. If not already done, zip the WordPress folder again, push it to the git repository and redeploy the application through CodeDeploy using the CommitID of the latest commit.

Go to plugins, the two plugins(Amazon Web Services and WP Offload S3) will be shown. Activate these two plugins. Also, after activating the Amazon Web Services plugin, AWS console will be added to the left bar. Go to AWS and define your Access keys and Secret keys in the wp-config.php.

After activating the WP Offload S3, go to its Settings and enter the name of the bucket in which you want to store the media contents of your blog posts. Save the settings.

Now try posting some media content in your blog post.

A folder wp-content will be created in the S3 bucket and the content will get stored in the same folder.

Let there be a loadbalancer

We are now almost done. In order to achieve the 'highly available' part of our initial goal, lets create a loadbalancer :)

Create an Elastic Load Balancer for high availability of your application. Give it a name.

Select a security group for it in the next screen and configure the health checks:

Review and Create.
Now, Attach this ELB with the autoscaling group:

Also, to access the application through the ELB endpoint, add the public DNS of the ELB to the server_name in /etc/nginx/conf.d/virtual.conf.

Happy CodeDeploy-ing! :)

Restrict IAM User to Particular Route53 Hosted Zone

Through AWS Internet Access Management (IAM) it’s possible to add people to manage all or parts of your AWS account. It takes just a few minutes to setup permissions, roles, and a new user but one item I battled to find was how to restrict the permissions of a certain user or group.

So, without further delay, here is the change that is needed to restrict permissions to a certain domain in IAM:

Setup your new User and Permissions (and Roles if needed).
From within Route 53 copy the Hosted Zone ID for the domain you want to allow access.
From the IAM dashboard Create a new policy:
Change the Hosted zone ID with your hosted zone ID which you want to restrict.

{

"Version": "2012-10-17",

"Statement":[

{

"Action":[

"route53:ChangeResourceRecordSets",

"route53:GetHostedZone",

"route53:ListResourceRecordSets"

"Effect":"Allow",

"Resource":[

"arn:aws:route53:::hostedzone/<Your zone ID>"

]

{

"Action":[

"route53:ListHostedZones"

"Effect":"Allow",

"Resource":[

"*"

]

}

]

}

AWS CodeDeploy Using S3

AWS has a great set of tools which helps simplify the deployment process in their cloud and one such tool is AWS CodeDeploy. In this blog, we will deploy the application using AWS CodeDeploy using S3.

Consider a use case where you have 20 instances and you want to deploy your code or change the configuration file of these instances. The only solution would be to login into each particular instance and then change the configuration file. AWS CodeDeploy lets you do this in just a few steps. You just create a deploy application and your code will be deployed in all these 20 instances.

Deploying code without using AWS CodeDeploy

Deploying code using AWS CodeDeploy

There are two ways to deploy code in Amazon Web Services:-

Using GIT
Using AWS S3 (Simple Storage Service)

Here, we will deploy the code using Amazon S3 service. Let us also understand few useful terms which will be used in the deployment process:

AppSpec file: - It is an Application Specification file. It is a unique file that defines a series of deployment actions that you want CodeDeploy to execute.
Deployment Application: - The unique name which will be given to your Deployment Application.
Revision: - It is a combination of AppSpec file and other files such as scripts, images, index files, media etc.
Deployment Group: - It is defined as a group of individual instances and auto-scaled instances.
Deployment Configuration: - It lets you side that how you want your code to be deployed: - one at a time/ half at a time/ all at once.

Deploying Code Using AWS S3

We’ll take a simple example to deploy the code using S3. We are deploying the code in a single instance and are launching a single t2.micro instance. Launch the instance and install Nginx in it as we are going to change the front page or index.html of the Nginx default configuration. You can install Nginx by logging into the instance and typing the following commands:

$sudo apt-get update

$sudo apt-get install nginx -y

Now let's move towards Code Deploy into an instance

Before starting with CodeDeploy, we need to have:-

Two IAM ROLES: one role will be given to EC2-instances to access s3 buckets and the other role is given to CodeDeploy service to choose Ec2-instances based on their tags.
One S3 bucket containing the appspec file, scripts and other files into a tar,gz or bz2 file (compressed format file). You need to store the compressed file into the S3 bucket. The files will automatically be uncompressed at the time of Deployment.

IAM Role Given to AWS CodeDeploy to access your EC2-instance:

=======================================================================

{

"Version": "2012-10-17",

"Statement": [

{

"Action": [

"autoscaling:PutLifecycleHook",

"autoscaling:DeleteLifecycleHook",

"autoscaling:RecordLifecycleActionHeartbeat",

"autoscaling:CompleteLifecycleAction",

"autoscaling:DescribeAutoscalingGroups",

"autoscaling:PutInstanceInStandby",

"autoscaling:PutInstanceInService",

"ec2:Describe*"

"Effect": "Allow",

"Resource": "*"

}

]

}

=======================================================================

IAM Role Given to EC2-instances to access S3 Buckets

=======================================================================

{

"Version": "2012-10-17",

"Statement":

[

{

"Action":

[

"s3:Get*",

"s3:List*"

"Effect": "Allow",

"Resource": "*"

}

]

}

=======================================================================

Trusted Relationship With AWS CodeDeploy IAM Role

{

“Version”: “2012-10-17″,

“Statement”: [

{

“Sid”: “”,

“Effect”: “Allow”,

“Principal”: {

“Service”:

[

"codedeploy.us-east-1.amazonaws.com",

"codedeploy.us-west-2.amazonaws.com"

]

“Action”: “sts:AssumeRole”

}

]

}

We also need to install AWS CodeDeploy Client to our instance. It will allow the code to be deployed into the instance. You can install the code-deploy client onto your instance by the following process:

Installing AWS CLI and AWS CodeDeploy Agent on Ubuntu 14.04 LTS :

$sudo apt-get update

$sudo apt-get install awscli

$sudo apt-get install ruby2.0

$cd /home/ubuntu

$sudo aws s3 cp s3://aws-codedeploy-us-east-1/latest/install . --region us-east-1

$sudo chmod +x ./install

$sudo ./install auto

Understanding APPSPEC FILE

AppSpec is the heart of CodeDeploy and is written in YAML. AppSpec defines how the application code will be deployed on deployment targets and which deployment lifecycle event hooks to run in response to various deployment lifecycle events. It should be in the root of an application source code’s directory structure.

High-Level Structure of AppSpec File:

1 version: 0.0

2 os: operating-system-name

3 files: source-destination-files-mappings

4 permissions: permissions-specifications

5 hooks: deployment-lifecycle-event-mappings

Hooks: scripts to run at specific deployment lifecycle events during the deployment. The available event hooks are:

ApplicationStop: events to be performed when application is stopped

DownloadBundle: occurs when CodeDeploy agent downloads bundle from S3 bucket

BeforeInstall: occurs before AWSCodeDeploy starts deployment of application code to deployment target

Install: AWSCodeDeploy copies files to deployment targets

AfterInstall: occurs once files are copied and installed to deployment targets

ApplicationStart: occurs just before your application revision is started on the deployment target

ValidateService: occurs after the service has been validated

The sample AppSpec file used is as shown below:

version: 0.0

os: linux

files:

- source: /

destination: /usr/share/nginx/html

hooks:

BeforeInstall:

- location: scripts/install_dependencies.sh

timeout: 300

runas: root

AfterInstall:

- location: scripts/afterinstall

timeout: 300

runas: root

ApplicationStart:

- location: scripts/start_server

timeout: 300

runas: root

ApplicationStop:

- location: scripts/stop_server

timeout: 300

runas: root

While creating an instance you need to attach the s3 bucket role with your instance and after that, you need to install AWS CLI and AWS Code Deploy Agent using the above procedure. Now you are ready to create the CodeDeploy Application.

Creating AWS CodeDeploy Application

Now a new window will open as shown below. Click on “Create New Application” button. It will open up the prompt to create a new application.

A new window will appear which ask about the details for creating an application. Enter Application Name, Application Group Name and choose instances to which you want to deploy the code using the Key and Value. Choose your Deploy Configuration: - One at a time /Half at a time /All at a time. This configuration lets you choose how you want to deploy your code.

Enter the application name and Application Group Name.

Choose instances based om their Key and Value

Then Click on “CREATE APPLICATION ” button. Your application will be created and a new window will appear as shown below.

You have to create a new revision. Click on Deploy New Revision button to create a new revision.

Now enter the Application Name, Deployment Group Name. Choose Revision type:- “My application is stored in Amazon S3.”. Give the Revision Location i.e. location of Bucket and the file name. (You can also copy the full path of file from AWS S3 and paste it here). After entering all the details, click on Deploy Now. Now your application and code is being deployed. Wait for few seconds and then refresh.

The status will appear as Succeeded. You can now hit the IP of your instance and you will get the index page that you deployed.

Hope this will help you!

Automating Windows Server backups on Amazon S3

1: Create an Amazon AWS account

If you don't already have an AWS account - create it here, it's free. Amazon's "free usage tier" on S3 gives you 5GB free storage from scratch, so after registering, sign in to your "AWS Management Console", select the "S3" tab and create one or more "buckets".

2: Get your access keys

You will need security credentials to access your online storage from the server, so click your account name - "Security Credentials" - "Access Keys" and copy your Key ID and Secret.

3: Download "S3Sync"

"S3Sync" is a great free command-line application from SprightlySoft. It is .NET-based and even comes with the source codes. At the time of writing this post their website was down, so I published the tool on Google Docs here: S3Sync.zip.

The tool syncs a given folder with your S3 bucket. And the best part - unlike similar scripts and utilities it performs a "smart" differential sync that detects additions, deletions, and file modifications.

extract the S3Sync.zip folder to C drive.

Location of S3sync folder = C:\S3Sync

4: Write a backup script

Create a batch file and paste this code into it:

cd C:\S3Sync
S3Sync.exe -AWSAccessKeyId xxxxxxx -AWSSecretAccessKey xxxxxxx -SyncDirection upload -LocalFolderPath "C:\inetpub\wwwroot" -BucketName YOURBUCKETNAME

The code above is pretty self-explanatory. Just replace the "xxxxxx" with your access codes from #2, "YOURBUCKETNAME" with the name of your S3 bucket, and "C:\inetpub\wwwroot" - with the folder you want to backup. Then create a scheduled task that runs the batch file every 24 hours, and you're all set.

Hope this will help you!

Thursday, 19 January 2017

How to Clear RAM Memory Cache, Buffer and Swap Space on Linux

Like any other operating system, GNU/Linux has implemented a memory management efficiently and even more than that. But if any process is eating away your memory and you want to clear it, Linux provides a way to flush or clear ram cache.

How to Clear Cache in Linux?

Every Linux System has three options to clear cache without interrupting any processes or services.

1. Clear PageCache only.

# sync; echo 1 > /proc/sys/vm/drop_caches

2. Clear dentries and inodes.

# sync; echo 2 > /proc/sys/vm/drop_caches

3. Clear PageCache, dentries and inodes.

# sync; echo 3 > /proc/sys/vm/drop_caches

Explanation of above command.

sync will flush the file system buffer. Command Separated by “;” run sequentially. The shell waits for each command to terminate before executing the next command in the sequence. As mentioned in kernel documentation, writing to drop_cache will clean cache without killing any application/service, command echo is doing the job of writing to file.

If you have to clear the disk cache, the first command is safest in enterprise and production as “...echo 1 > ….” will clear the PageCache only. It is not recommended to use the third option above “...echo 3 >” in production until you know what you are doing, as it will clear PageCache, dentries, and inodes.

Is it a good idea to free Buffer and Cache in Linux that might be used by Linux Kernel?

When you are applying various settings and want to check, if it is actually implemented specially on I/O-extensive benchmark, then you may need to clear buffer cache. You can drop cache as explained above without rebooting the System i.e., no downtime required.

Linux is designed in such a way that it looks into disk cache before looking onto the disk. If it finds the resource in the cache, then the request doesn’t reach the disk. If we clean the cache, the disk cache will be less useful as the OS will look for the resource on the disk.

Moreover, it will also slow the system for a few seconds while the cache is cleaned and every resource required by OS is loaded again in the disk-cache.

Now we will be creating a shell script to auto clear RAM cache daily at 2 am via a cron scheduler task. Create a shell script clearcache.sh and add the following lines.

#!/bin/bash

# Note, we are using "echo 3", but it is not recommended in production instead use "echo 1"

echo "echo 3 > /proc/sys/vm/drop_caches"

Set execute permission on the clearcache.sh file.

# chmod 755 clearcache.sh

Now you may call the script whenever you required to clear ram cache.

Now set a cron to clear RAM cache everyday at 2 am. Open crontab for editing.

# crontab -e

Append the below line, save and exit to run it at 2 am daily.

0 2 * * * /path/to/clear cache.sh

For more details on how to cron a job, you may like to check our article on 11 Cron Scheduling Jobs.

Is it the good idea to auto clear RAM cache on the production server?

No! it is not. Think of a situation when you have scheduled the script to clear ram cache everyday at 2am. Everyday at 2am the script is executed and it flushes your RAM cache. One day for whatsoever reason, may be more than expected users are online on your website and seeking resource from your server.

At the same time scheduled script run and clears everything in cache. Now all the user are fetching data from disk. It will result in server crash and corrupt the database. So clear ram-cache only when required,and known your foot steps, else you are a Cargo Cult System Administrator.

How to Clear Swap Space in Linux?

If you want to clear Swap space, you may like to run the below command.

# swapoff -a && swapon -a

Also you may add above command to a cron script above, after understanding all the associated risk.

Now we will be combining both above commands into one single command to make a proper script to clear RAM Cache and Swap Space.

# echo 3 > /proc/sys/vm/drop_caches && swapoff -a && swapon -a && printf '\n%s\n' 'Ram-cache and Swap Cleared'

$ su -c "echo 3 >'/proc/sys/vm/drop_caches' && swapoff -a && swapon -a && printf '\n%s\n' 'Ram-cache and Swap Cleared'" root

After testing both above command, we will run command “free -h” before and after running the script and will check cache.

That’s all for now, if you liked the article, don’t forget to provide us with your valuable feedback in the comments to let us know, what you think is it a good idea to clear ram cache and buffer in production and Enterprise?

Wednesday, 18 January 2017

Apache vs Nginx: Practical Considerations

Introduction

Apache and Nginx are the two most common open source web servers in the world. Together, they are responsible for serving over 50% of traffic on the internet. Both solutions are capable of handling diverse workloads and working with other software to provide a complete web stack.

While Apache and Nginx share many qualities, they should not be thought of as entirely interchangeable. Each excels in its own way and it is important to understand the situations where you may need to reevaluate your web server of choice. This article will be devoted to a discussion of how each server stacks up in various areas.

General Overview

Before we dive into the differences between Apache and Nginx, let's take a quick look at the background of these two projects and their general characteristics.

Apache

The Apache HTTP Server was created by Robert McCool in 1995 and has been developed under the direction of the Apache Software Foundation since 1999. Since the HTTP web server is the foundation's original project and is by far their most popular piece of software, it is often referred to simply as "Apache".

The Apache web server has been the most popular server on the internet since 1996. Because of this popularity, Apache benefits from great documentation and integrated support from other software projects.

Apache is often chosen by administrators for its flexibility, power, and widespread support. It is extensible through a dynamically loadable module system and can process a large number of interpreted languages without connecting out to separate software.

Nginx

In 2002, Igor Sysoev began work on Nginx as an answer to the C10K problem, which was a challenge for web servers to begin handling ten thousand concurrent connections as a requirement for the modern web. The initial public release was made in 2004, meeting this goal by relying on an asynchronous, events-driven architecture.

Nginx has grown in popularity since its release due to its light-weight resource utilization and its ability to scale easily on minimal hardware. Nginx excels at serving static content quickly and is designed to pass dynamic requests off to other software that is better suited for those purposes.

Nginx is often selected by administrators for its resource efficiency and responsiveness under load. Advocates welcome Nginx's focus on core web server and proxy features.

Connection Handling Architecture

One big difference between Apache and Nginx is the actual way that they handle connections and traffic. This provides perhaps the most significant difference in the way that they respond to different traffic conditions.

Apache

Apache provides a variety of multi-processing modules (Apache calls these MPMs) that dictate how client requests are handled. Basically, this allows administrators to swap out its connection handling architecture easily. These are:

mpm_prefork: This processing module spawns processes with a single thread each to handle the request. Each child can handle a single connection at a time. As long as the number of requests is fewer than the number of processes, this MPM is very fast. However, performance degrades quickly after the requests surpass the number of processes, so this is not a good choice in many scenarios. Each process has a significant impact on RAM consumption, so this MPM is difficult to scale effectively. This may still be a good choice though if used in conjunction with other components that are not built with threads in mind. For instance, PHP is not thread-safe, so this MPM is recommended as the only safe way of working with mod_php, the Apache module for processing these files.
mpm_worker: This module spawns processes that can each manage multiple threads. Each of these threads can handle a single connection. Threads are much more efficient than processes, which means that this MPM scales better than the prefork MPM. Since there are more threads than processes, this also means that new connections can immediately take a free thread instead of having to wait for a free process.
mpm_event: This module is similar to the worker module in most situations, but is optimized to handle keep-alive connections. When using the worker MPM, a connection will hold a thread regardless of whether a request is actively being made for as long as the connection is kept alive. The event MPM handles keep alive connections by setting aside dedicated threads for handling keep alive connections and passing active requests off to other threads. This keeps the module from getting bogged down by keep-alive requests, allowing for faster execution. This was marked stable with the release of Apache 2.4.

As you can see, Apache provides a flexible architecture for choosing different connection and request handling algorithms. The choices provided are mainly a function of the server's evolution and the increasing need for concurrency as the internet landscape has changed.

Nginx

Nginx came onto the scene after Apache, with more awareness of the concurrency problems that would face sites at scale. Leveraging this knowledge, Nginx was designed from the ground up to use an asynchronous, non-blocking, event-driven connection handling algorithm.

Nginx spawns worker processes, each of which can handle thousands of connections. The worker processes accomplish this by implementing a fast looping mechanism that continuously checks for and processes events. Decoupling actual work from connections allows each worker to concern itself with a connection only when a new event has been triggered.

Each of the connections handled by the worker are placed within the event loop where they exist with other connections. Within the loop, events are processed asynchronously, allowing work to be handled in a non-blocking manner. When the connection closes, it is removed from the loop.

This style of connection processing allows Nginx to scale incredibly far with limited resources. Since the server is single-threaded and processes are not spawned to handle each new connection, the memory and CPU usage tends to stay relatively consistent, even at times of heavy load.

Static vs Dynamic Content

In terms of real-world use-cases, one of the most common comparisons between Apache and Nginx is the way in which each server handles requests for static and dynamic content.

Apache

Apache servers can handle static content using its conventional file-based methods. The performance of these operations is mainly a function of the MPM methods described above.

Apache can also process dynamic content by embedding a processor of the language in question into each of its worker instances. This allows it to execute dynamic content within the web server itself without having to rely on external components. These dynamic processors can be enabled through the use of dynamically loadable modules.

Apache's ability to handle dynamic content internally means that configuration of dynamic processing tends to be simpler. Communication does not need to be coordinated with an additional piece of software and modules can easily be swapped out if the content requirements change.

Nginx

Nginx does not have any ability to process dynamic content natively. To handle PHP and other requests for dynamic content, Nginx must pass to an external processor for execution and wait for the rendered content to be sent back. The results can then be relayed to the client.

For administrators, this means that communication must be configured between Nginx and the processor over one of the protocols Nginx knows how to speak (http, FastCGI, SCGI, uWSGI, memcache). This can complicate things slightly, especially when trying to anticipate the number of connections to allow, as an additional connection will be used for each call to the processor.

However, this method has some advantages as well. Since the dynamic interpreter is not embedded in the worker process, its overhead will only be present for dynamic content. Static content can be served in a straight-forward manner and the interpreter will only be contacted when needed. Apache can also function in this manner, but doing so removes the benefits in the previous section.

Distributed vs Centralized Configuration

For administrators, one of the most readily apparent differences between these two pieces of software is whether a directory-level configuration is permitted within the content directories.

Apache

Apache includes an option to allow additional configuration on a per-directory basis by inspecting and interpreting directives in hidden files within the content directories themselves. These files are known as .htaccess files.

Since these files reside within the content directories themselves, when handling a request, Apache checks each component of the path to the requested file for a .htaccess file and applies the directives found within. This effectively allows decentralized configuration of the web server, which is often used for implementing URL rewrites, access restrictions, authorization, and authentication, even caching policies.

While the above examples can all be configured in the main Apache configuration file, .htaccess files have some important advantages. First, since these are interpreted each time they are found along a request path, they are implemented immediately without reloading the server. Second, it makes it possible to allow non-privileged users to control certain aspects of their own web content without giving them control over the entire configuration file.

This provides an easy way for certain web software, like content management systems, to configure their environment without providing access to the central configuration file. This is also used by shared hosting providers to retain control of the main configuration while giving clients control over their specific directories.

Nginx

Nginx does not interpret .htaccess files, nor does it provide any mechanism for evaluating per-directory configuration outside of the main configuration file. This may be less flexible than the Apache model, but it does have its own advantages.

The most notable improvement over the .htaccess system of directory-level configuration is increased performance. For a typical Apache setup that may allow .htaccess in any directory, the server will check for these files in each of the parent directories leading up to the requested file, for each request. If one or more .htaccess files are found during this search, they must be read and interpreted. By not allowing directory overrides, Nginx can serve requests faster by

doing a single directory lookup and file read for each request (assuming that the file is found in the conventional directory structure).

Another advantage is security related. Distributing directory-level configuration access also distributes the responsibility of security to individual users, who may not be trusted to handle this task well. Ensuring that the administrator maintains control over the entire web server can prevent some security missteps that may occur when access is given to other parties.

Keep in mind that it is possible to turn off .htaccess interpretation in Apache if these concerns resonate with you.

File vs URI-Based Interpretation

How the web server interprets requests and maps them to actual resources on the system is another area where these two servers differ.

Apache

Apache provides the ability to interpret a request for a physical resource on the filesystem or as a URI location that may need a more abstract evaluation. In general, for the former Apache uses <Directory> or <Files> blocks, while it utilizes <Location> blocks for more abstract resources.

Because Apache was designed from the ground up as a web server, the default is usually to interpret requests as filesystem resources. It begins by taking the document root and appending the portion of the request following the host and port number to try to find an actual file. Basically, the filesystem hierarchy is represented on the web as the available document tree.

Apache provides a number of alternatives for when the request does not match the underlying filesystem. For instance, an Alias directive can be used to map to an alternative location. Using <Location> blocks is a method of working with the URI itself instead of the filesystem. There are also regular expression variants which can be used to apply configuration more flexible throughout the filesystem.

While Apache has the ability to operate on both the underlying filesystem and the web space, it leans heavily towards filesystem methods. This can be seen in some of the design decisions, including the use of .htaccess files for per-directory configuration. The Apache docs themselves warn against using URI-based blocks to restrict access when the request mirrors the underlying filesystem.

Nginx

Nginx was created to be both a web server and a proxy server. Due to the architecture required for these two roles, it works primarily with URIs, translating to the filesystem when necessary.

This can be seen in some of the ways that Nginx configuration files are constructed and interpreted.Nginx does not provide a mechanism for specifying configuration for a filesystem directory and instead parses the URI itself.

For instance, the primary configuration blocks for Nginx are server and location blocks. The server block interprets the host being requested, while the location blocks are responsible for matching portions of the URI that comes after the host and port. At this point, the request is being interpreted as a URI, not as a location on the filesystem.

For static files, all requests eventually have to be mapped to a location on the filesystem. First, Nginx selects the server and location blocks that will handle the request and then combines the document root with the URI, adapting anything necessary according to the configuration specified.

This may seem similar, but parsing requests primarily as URIs instead of filesystem locations allows Nginx to more easily function in both web, mail, and proxy server roles. Nginx is configured simply by laying out how to respond to different request patterns. Nginx does not check the filesystem until it is ready to serve the request, which explains why it does not implement a form of .htaccess files.

Modules

Both Nginx and Apache are extensible through module systems, but the way that they work differ significantly.

Apache

Apache's module system allows you to dynamically load or unload modules to satisfy your needs during the course of running the server. The Apache core is always present, while modules can be turned on or off, adding or removing additional functionality and hooking into the main server.

Apache uses this functionality for a large variety tasks. Due to the maturity of the platform, there is an extensive library of modules available. These can be used to alter some of the core functionality of the server, such as mod_php, which embeds a PHP interpreter into each running worker.

Modules are not limited to processing dynamic content, however. Among other functions, they can be used for rewriting URLs, authenticating clients, hardening the server, logging, caching, compression, proxying, rate limiting, and encrypting. Dynamic modules can extend the core functionality considerably without much additional work.

Nginx

Nginx also implements a module system, but it is quite different from the Apache system. In Nginx, modules are not dynamically loadable, so they must be selected and compiled into the core software.

For many users, this will make Nginx much less flexible. This is especially true for users who are not comfortable maintaining their own compiled software outside of their distribution's conventional packaging system. While distributions' packages tend to include the most commonly used modules, if you require a non-standard module, you will have to build the server from source yourself.

Nginx modules are still very useful, though, and they allow you to dictate what you want out of your server by only including the functionality you intend to use. Some users also may consider this more secure, as arbitrary components cannot be hooked into the server. However, if your server is ever put in a position where this is possible, it is likely compromised already.

Nginx modules allow many of the same capabilities as Apache modules. For instance, Nginx modules can provide proxying support, compression, rate limiting, logging, rewriting, geolocation, authentication, encryption, streaming, and mail functionality.

Support, Compatibility, Ecosystem, and Documentation

A major point to consider is what the actual process of getting up and running will be given the landscape of available help and support among other software.

Apache

Because Apache has been popular for so long, support for the server is fairly ubiquitous. There is a large library of first- and third-party documentation available for the core server and for task-based scenarios involving hooking Apache up with other software.

Along with documentation, many tools and web projects include tools to bootstrap themselves within an Apache environment. This may be included in the projects themselves, or in the packages maintained by your distribution's packaging team.

Apache, in general, will have more support from third-party projects simply because of its market share and the length of time it has been available. Administrators are also somewhat more likely to have experience working with Apache not only due to its prevalence but also because many people start off in shared hosting scenarios which almost exclusively rely on Apache due to the .htaccess distributed management capabilities.

Nginx

Nginx is experiencing increased support as more users adopt it for its performance profile, but it still has some catching up to do in some key areas.

In the past, it was difficult to find comprehensive English-language documentation regarding Nginx due to the fact that most of the early development and documentation were in Russian. As interest in the project grew, the documentation has been filled out and there are now plenty of administration resources on the Nginx site and through third parties.

In regards to third-party applications, support and documentation is becoming more readily available, and package maintainers are beginning, in some cases, to give choices between auto-configuring for Apache and Nginx. Even without support, configuring Nginx to work with alternative software is usually straightforward so long as the project itself documents its requirements (permissions, headers, etc).

Using Apache and Nginx Together

After going over the benefits and limitations of both Apache and Nginx, you may have a better idea of which server is more suited to your needs. However, many users find that it is possible to leverage each server's strengths by using them together.

The conventional configuration for this partnership is to place Nginx in front of Apache as a reverse proxy. This will allow Nginx to handle all requests from clients. This takes advantage of Nginx's fast processing speed and ability to handle large numbers of connections concurrently.

For static content, which Nginx excels at, the files will be served quickly and directly to the client. For dynamic content, for instance, PHP files, Nginx will proxy the request to Apache, which can then process the results and return the rendered page. Nginx can then pass the content back to the client.

This setup works well for many people because it allows Nginx to function as a sorting machine. It will handle all requests it can and pass on the ones that it has no native ability to serve. By cutting down on the requests the Apache server is asked to handle, we can alleviate some of the blocking that occurs when an Apache process or thread is occupied.

This configuration also allows you to scale out by adding additional backend servers as necessary. Nginx can be configured to pass to a pool of servers easily, increasing this configuration's resilience to failure and performance.

Conclusion

As you can see, both Apache and Nginx are powerful, flexible, and capable. Deciding which server is best for you is largely a function of evaluating your specific requirements and testing with the patterns that you expect to see.

There are differences between these projects that have a very real impact on the raw performance, capabilities, and the implementation time necessary to get each solution up and running. However, these usually are the result of a series of trade-offs that should not be casually dismissed. In the end, there is no one-size-fits-all web server, so use the solution that best aligns with your objectives.

Subscribe to: Posts ( Atom )