2018-04-02

Simple server load test with cron and ab (Linux)

Load testing "refers to the practice of modeling the expected usage of a software program by simulating multiple users accessing the program concurrently. As such, this testing is most relevant for multi-user systems; often one built using a client/server model, such as web servers."

I've found many articles online explaining how ApacheBench lets you "load test" with a single command from a Linux terminal, but is this a realistic load test? A single execution of ab is a very limited simulation of what actually happens when multiple users try to access your web application. A server may perform well if it has to work hard for a 30 seconds (possible execution time of an ab command), but what happens when 20000 extra requests hit your web app after it's already been stressed for hours?


Apache HTTP server benchmarking tool (ApacheBench) is a simple yet great tool which was "designed to give you an impression of how your current Apache installation performs." It can be leveraged to load test any web server setup, but we need to think for a minute about what exactly we're simulating. Here are a few examples:

  1. An average of 1000 request per minute by 30 different users reach a web server, with spikes of up to 5000 request by 100 users every hour or so.
  2. We expect 15000 requests every five min (by 50-100 different users), which doubles from 7 to 10pm on weekdays.
  3. Up to 10 other systems access my REST API, with 500,000 up to a million requests per hour each.

This is where cron comes into play. Cron is a time-based job scheduler in Linux, this means you can use it to program commands to execute at specific times in the background, including recurrent runs of the same command (for example on minute 15 of every hour). Like ab, it's a pretty simple tool which you can access with the crontab -e command in Linux, which opens your preferred editor (typically nano) for you to enter single-line 6-field expressions in the CRON format – which may vary slightly among Linux distributions: m h dom mon dow command (minutes, hours, day of month, month, day of week, command). Going back to the 3 examples:

  1. We need 2 entries in crontab:
    * * * * * ab -k -c 30 -n 1000 http://server-hostname/ # every minute
    0 * * * * ab -k -c 70 -n 4000 http://server-hostname/ # every hour
  2. Now we may need 3 entries:
    /5 0-19 * * * ab -k -c `shuf -i 50-100 -n 1` -n 1000 http://webapp-hostname/ # every 5 min in the initial normal hours (12am to 7pm)
    /5 19-22 * * * ab -c `shuf -i 50-100 -n 1` -n 4000 http://webapp-hostname/path/ # every 5 min in "rush hours" (7-10pm)
    /5 22,23 * * * ab -k -c `shuf -i 50-100 -n 1` -n 1000 http://webapp-hostname/path/ # every 5 min in the remaining normal hours (10pm and 22pm)
  3. A single entry will do here:
    30 * * * * ab -c 10 -n `shuf -i 50000-1000000 -n 1` http://api-hostname/get?query # every hour (on minute :30)
Notes:
I use -k in some of the ab commands to web applications, this uses HTTP keep-alive and is meant to simulate returning individual users.
The shuf on Linux command generates a random number within a given range (-i) and increment (-n).

These are simple use cases which begin to approximate complete load tests but which do not take into account certain factors such as multiple URL paths or POST requests. Also, in order to see the output of the ab commands executed by cron, we need to add log files to the mix. I'll leave that for you to figure out but here's a tip, on example #3:

30 * * * * ab -c 10 -n `shuf -i 50000-1000000 -n 1` http://api-hostname/get?query >> /home/myuser/my/api/tests/load/cronab.log

ab's output is a report that looks like this:

Concurrency Level:      35
Time taken for tests:   38.304 seconds
Complete requests:      220000
Failed requests:        0
Keep-Alive requests:    217820
Total transferred:      70609100 bytes
HTML transferred:       18480000 bytes
Requests per second:    5743.58 [#/sec] (mean)
Time per request:       6.094 [ms] (mean)
Time per request:       0.174 [ms] (mean, across all concurrent requests)
Transfer rate:          1800.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       5
Processing:     0    6   1.4      6      25
Waiting:        0    6   1.4      6      25
Total:          0    6   1.4      6      25

...

Final tip: How to determine infrastructural limits?

Every server's capacity is different. While reports from trial-and-error executions of ab can give you an idea of where a web application's infrastructure (servers) start to falter (mean response times go up exponentially), the very best way is by having a visual APM such as Amazon CloudWatch, in AWS. Monitoring graphs of different metrics over time –e.g. requests handled, errors, dropped connections, CPU utilization, memory, or swap usage– once we have left ab run on cron for hours or even days lets you better adjust the number of requests and concurrency for future ab commands. Try to find that breaking point!

Thanks for reading (=

2018-03-13

Dockerize your Django App (for local development on macOS)

I'm writing this guide on how to containerize an existing (simple) Python Django App into Docker for local development since this is how I learned to to develop with Docker, seeing that the existing django images and guides seem to focus on new projects.

For more complete (production-level) stack guides you can refer to Real Python's Django Development with Docker Compose and Machine or transposedmessenger's Deploying Cookiecutter-Django with Docker-Compose.

Pre-requisites

  • An existing Django app which you can run locally (directly or in Virtualenv). We will run the local dev server with manage.py runserver.
  • A requirements.txt file with the app dependencies, as is standard for Python projects; including MySQL-python.
  • Working local MySQL server and existing database. (This guide could easily be adapted for other SQL engines such as Postgres.)
  • Install Docker. You can see Docker as a virtual machine running Linux on top of your OS ("the host"), which in turn can run containers – which act as individual machines.
Keep in mind I'm currently working on a MacBook (macOS 10.13). Locally I'm using stock Python 2.7, Homebrew MySQL 5.7, and Django 1.11 (via pip)

Summary

  1. Make sure your Django settings are compatible with the conteinerization;
  2. Create a Dockerfile (and .dockerignore) to define the web app container;
  3. Create docker-compose.yml to setup the web and database services;
  4. Find/Set local MySQL data directory, and run docker-compose!

Django settings

Here, we just want to make sure that our app settings (typically in settings.py), use environment variables –which can be provided by Docker (in following sections), to connect to MySQL for example:


import os

# ... your settings

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': os.getenv('DB_NAME'),
        'USER': os.getenv('USERNAME'),
        'PASSWORD': os.getenv('PASSWORD'),
        'HOST': os.getenv('HOSTNAME'),
        'PORT': os.getenv('PORT'),
    },
}

# ... more stuff

SOME_API_TOKEN = os.getenv('TOKEN_FOR_MY_APP')


Note: I use os.getenv because it just returns None if the env var doesn't exist, like for PASSWORD in my case.

Dockerfile

This file informs Docker how to build your container image. In the root of your project directory, create a file called Dockerfile (no extension) with these contents:


# Use slim Python as a parent image
FROM python:2.7-slim
ENV PYTHONUNBUFFERED 1

# Set the working dir.
WORKDIR /web
ADD . /web

# Install required system libs (so pip install succeeds)
RUN apt-get update
RUN apt-get install -y libmysqlclient-dev
RUN apt-get install -y gcc
# Install server-side dependencies
RUN pip install -r requirements.txt

EXPOSE 8000

# Env vars
ENV DB_NAME my_database
ENV USERNAME root
ENV HOSTNAME localhost
ENV PORT 3306
ENV TOKEN_FOR_MY_APP 3FsMv8pTt62aDwaKkCzsPbBQZ0dSaff4tiP5a2eP

# Run Python's dev web server when the container launches
CMD ["/web/manage.py", "runserver", "0.0.0.0:8000"]


What this does, first, is that it sets the python:2.7-slim image as the base for our to-be container. I chose 2.7-slim to keep the new image small (2.7 is over 681MB in size while 2.7-slim is 139MB),. Then it installs the libmysqlclient-dev library (needed for Pyton's MySQL-python package) and gcc tool (needed for pip install) on the system. 
(Note that these build dependencies are already included in the python:2.7 image so we wouldn't need to specify installing them if using that heavier option.)

This will also create the /web folder in the container image, and move all the project files (except those specified in .dockerignore) to /web . And it sets some environment variables needed in the application, most notably credentials for the database connection – this we could omit though, since they can be set in the docker command line tool so feel free to adjust at will.

Finally, we have a CMD entry to run the  /web/manage.py runserver 0.0.0.0:8000  command by default when starting the container. Specifying "0.0.0.0" is important because without that, runserver only listens to 127.0.0.1/localhost; That would be a problem because the HTTP request to the web server in the container will not come from its localhost, but from an IP address assigned to the host by Docker (most likely 172.17.0.1 but this could differ among Docker versions).

Notes:
1. I'm not sure PYTHONUNBUFFERED is needed;
2. For an even smaller image, you can try a Multistage Build.
3. Actually, much of the behavior specified here (/web folder, env vars, CMD) will be overwritten by the compose file (see next section), but I felt it was important keeping it to the Dockerfile to make the web app useful on its own, as it could be built into a separate image with docker build; and for educational purposes (:

You probably also want to create a file called .dockerignore to tell the Docker Builder which files NOT to put in your image at build time. I thought of Python bytecode files and macOS Finder's metadata:


.git

# MacOS
.DS_Store

# Python
*.pyc
.env


Notice I also added lines for ".git" and ".env", which you may or may not need. I'm using Git to control the code versioning in my project, and python-dotenv to keep secrets out of the code repository. If you're not familiar with dotenv, I recommend you check it out (originally for Ruby).

Compose file

Note: If I was running Docker on a Linux host (or perhaps Docker on Docker), this step would probably not be necessary, since we could just use a "host" network driver and let the web container defined above connect to the MySQL running on the Docker host directly.
However, host networks do not work on Docker for Mac at the time of writing :/ but I took this challenge as an opportunity to learn how to use Docker Compose.

Create a file called docker-compose.yml (See YAML format) next to the previous files with:


version: '3.2'

services:
  db:
    image: mysql:5.7
    volumes:
      - type: bind
        source: ${MYSQL_DATADIR}
        target: /var/lib/mysql
    restart: on-failure
  web:
    depends_on:
      - db
    build: .
    volumes:
      - .:/web
    ports:
      - "8000:8000"
    environment:
      - HOSTNAME=db
    command: "python manage.py runserver 0.0.0.0:8000"
    restart: on-failure


Note: We need version 3.2 for some of the newest format options such as expanded volumes key.

This file defines 2 "services" (each run as a separate container) in the same image. The first one, "db" uses the official MySQL image from Docker Hub, and mounts a special folder from the host (detailed in the next section) into the container's /var/lib/mysql dir. This is a trick for the db service to use the host's MySQL data directory (as described in the mysql image's documentation) – with all the data and permissions you already have established locally, neat! (Although useless for remote deployment.)

The 2nd service, "web" is our Django app. It depends on db. Here, we map the project directory to /web in the container, as well as the 8000 ports between container and host. We also (over)write the HOSTNAME env var with the name of the mysql service for the app to know where to look for it's database connection. (Docker compose, used in the next section, will automatically setup a network where this hostname corresponds to the db service container.)

Show time!

Just one more detail. To complete the local MySQL data dir trick, we need to actually find its location in our system. With mysql.server running locally, run this in Terminal:

mysql -u root -p -N -B -e 'SELECT @@GLOBAL.datadir'

(And enter the password for root user, if any.) Now copy the value (/usr/local/var/mysql in my case) into the following command:


MYSQL_DATADIR=/usr/local/var/mysql docker-compose up


Docker compose reads docker-compose.yml by default and begins the entire process of building the image (in turn looking in .dockerignore and Dockerfile by default for the web service), creating and initializing both service containers, the Docker network, and starting their default commands (mysql, and manage.py runserver, respectively).

Keep an eye on the output in the terminal window (conveniently from both containers), and head to http://localhost:8000/ in your browser to see the app running. As usual, Django's runserver will reload when any changes to the app's source code is saved, and this is reflected in the web service container.

OK Bye

Thanks for reading!  ᕕ( ᐛ )ᕗ  I hope you found this guide useful and concise. 

Keep in mind there may be some redundant stuff happening above, which was left for educational purposes as it was a long process for me to figure each step out (so I left traces of each one). There may also be unexplained code here and there, left for you to investigate ;)

Last note: If your Django app is more complex (e.g. using Redis or needing load balancing), I hope this at least gives you an idea on how to begin containerizing it.


Author: @jorgeorpinel
Jorge A Orpinel Pérez (http://jorge.orpinel.com/)
Independent Software Engineer Consultant

    Simple server load test with cron and ab (Linux)

    Load testing "refers to the practice of modeling the expected usage of a software program by simulating multiple users accessing the p...