In spite of my intentions to get more involved in Elixir
I’ve been stuck in the Python tractor beam.
For all of the issues that may arise in large Python web applications, Python really is a fantastic
do-it-all language. As one of my colleagues recently said:
Python is the second best language for everything.
I’m still a very big fan of the Serverless framework and have been using it almost constantly at work. So far, I’ve written fairly substantial Serverless systems for a variety of projects:
- ETL jobs synchronizing Shopify orders with 3rd party fulfillment centers
- Data pipeline / ETL process for Strava data
- REST APIs
- Alexa skill
There is a pattern which I’ve come up with that has been working out quite well, which is the subject of this post.
The problems
There are two main problems with Python code on Lambda which stem from including extra packages in your project. In the real world, you are very likely going to want or need some packages beyond the standard library.
- Python packages which have
C
bindings need to be built using a machine with the same architecture as that which Lambda functions run (i.e., Linux). - With Lambda, you are responsible for managing Python’s path so it can find your dependencies.
I’ll walk through my setup and discuss how these problems are solved.
The solutions
- Docker
- Add 4 lines of code at the top of
handler.py
to add a directory to yoursys.path
Docker
In this setup, we’re using Docker as a utility. The Docker image
I’m using is the official Python 2 image with the Serverless framework installed globally.
This image is on Docker hub and is maintained
by me, updated whenever there is a new version of Serverless. You can see the
Dockerfile as well since it’s all
open source.
If you’re running Linux on your host system, you won’t need to deal with this at all. Rather, this tip is for the OS X and Windows folks out there.
Structure
I structure all my Serverless project as so:
├── Makefile
├── envs
│ └── dev
├── requirements.txt
└── serverless
├── handler.py
├── lib
└── serverless.yml
The important bits:
- Makefile is used as a controller to lessen the burden of remembering a bunch of commands and to allow you to type less. We’ll go through this in more detail.
envs
will hold one or more environment variable files. You may have different files in here for your different stacks…dev
,test
,production
. This allows us to easily switch between environments.requirements.txt
should be self explanatoryserverless
is the root of your serverless project
Makefile and envs
It may be better to show and example of what’s needed to deploy a new stack. Using the Makefile
I can simply do:
$ ENV=dev make shell
docker run --rm -it -v `pwd`:/code --env ENV=dev --env-file envs/dev --name=supersecret-serverless-dev "verypossible/serverless:1.17" bash
root@f513331941bc:/code#
root@f513331941bc:/code# make deploy
Breaking that down:
ENV=dev make shell
launches the container with the variable ENV
set to dev
. The value
for this variable needs to map to a file in your envs
directory. Provided you are getting
configuration from the environment in your Python code (and you should be) this makes is trivial to
change the stack which you’re working with.
Imagine you also have envs/test
and envs/production
files which hold key-value pairs for
configuration. In order to launch your test
stack you would do:
$ ENV=test make shell
How is this working? The baseline Makefile
is shown below. You will see a command called run
which is executed using the ENV
variable when the make shell
is called. Using the docker
--env-file
argument, we inject those variables into the Docker container.
NAME = "verypossible/serverless:1.17"
ENVDIR=envs
LIBS_DIR=serverless/lib
PROJECT=supersecret
.PHONY: clean \
deploy \
env-dir \
shell \
test \
test-watch \
libs
run = docker run --rm -it \
-v `pwd`:/code \
--env ENV=$(ENV) \
--env-file envs/$2 \
--name=$(PROJECT)-serverless-$(ENV) $(NAME) $1
shell : check-env env-dir
$(call run,bash,$(ENV))
env-dir :
@test -d $(ENVDIR) || mkdir -p $(ENVDIR)
clean :
@test -d $(LIBS_DIR) || mkdir -p $(LIBS_DIR)
rm -rf $(LIBS_DIR)/*
# make libs should be run from inside the container
libs :
@test -d $(LIBS_DIR) || mkdir -p $(LIBS_DIR)
pip install -t $(LIBS_DIR) -r requirements.txt
rm -rf $(LIBS_DIR)/*.dist-info
find $(LIBS_DIR) -name '*.pyc' | xargs rm
find $(LIBS_DIR) -name tests | xargs rm -rf
# NOTE:
#
# Deployments assume you are already running inside the docker container
#
deploy : check-env
cd serverless && sls deploy -s $(ENV)
# Note the ifndef must be unindented
check-env:
ifndef ENV
$(error ENV is undefined)
endif
I should note that in order to deploy your serverless project you will need AWS credentials. Each
envs
file you create will need to have the following:
AWS_DEFAULT_REGION=us-west-2
AWS_SECRET_ACCESS_KEY=ASFASDFASFASFASFDSADF
AWS_ACCESS_KEY_ID=1234ABCDEF
envs
should be in your .gitignore
You really don’t want to be committing sensitive
variables into source control…so please ensure you have added envs
into your .gitignore
.
Now that we have a bash shell open in our container, the deployment is simply make deploy
.
Looking above you can see there isn’t much magic there. The only trick is that we’re taking the
value for ENV
(which also gets injected as a variable into the container) and us that as the
Serverless stage
using the -s
argument. With that, you can now work on completely separate
stacks using the exact same code.
Libraries
Now that the hard work it out of the way we’re all clear to install some libraries. Common
libraries which have C
bindings that you may want to use are psycopg2
, python-mysql
, yaml
, and all
or most of the data science packages (numpy
, etc.).
Add whatever you need into requirements.txt
. From within the container in the same directory
as the Makefile
(which happens to be /code
run:
root@f513331941bc:/code# make libs
Looking at the Makefile
you’ll see again there isn’t much magic to this. The key here is that we
are building our C
bindings on the same architecture that Lambda uses to run your functions,
that is, Linux.
If you shut down your container you will notice that your libs
directory is still there. This is
nice and on purpose…using the -v
(volume) argument to docker run
we’re able to map our host’s
directory into the container. Any packages we install will be built from within the Linux
container but will ultimately be written to our host’s file-system. You’ll only need to make libs
when you add or update your requirements.txt
files. There is also a make clean
command which can be
used to start over.
handler.py
Now that we have all of our libraries we need to tell our Python code how to find them. At the top of
handler.py
, I always have these first four lines of code (two imports + two lines to deal with
sys.path
):
# begin magic four lines
import os
import sys
CWD = os.path.dirname(os.path.realpath(__file__))
sys.path.insert(0, os.path.join(CWD, "lib"))
# end magic four lines
# now it's ok to import extra libraries
import numpy as np
def handler(event, context):
pass
Another very useful convention I’ve come to settle on is using a single handler.py
function as the
entrypoint for all of my functions. The handler does nothing more than the basic bootstrapping of
the path, importing my own modules and handing off the work to those other modules. In the end,
the file structure looks something like this:
$ tree -L 2
.
├── Dockerfile
├── Makefile
├── envs
│ ├── dev
│ └── production
├── requirements.txt
├── serverless
│ ├── handler.py
│ ├── lib
│ ├── serverless.yml
│ └── very
│ ├── aws.py
│ ├── constants.py
│ └── feed.py
└── tests
├── __init__.py
├── conftest.py
├── test_aws.py
└── test_feed.py
handler.py
will import my other modules which happen to be inside the very
directory in this
example and rely on them to execute my business logic. Using convention you can be sure that the
system path is already set up so that importing your extra modules will work as you’d expect,
without needing to alter the path again.
Conclusion
Docker along with this Makefile
make is extremely easy to manage different deployments of your Serverless
stack and facilitate quickly iterating on your code.
Still, there are a few gotchas which take a little time to learn and master. Organizing my
Serverless projects like this has saved
me quite a bit of time. I can spin up a new project in a matter of minutes and deploy code changes
within seconds, all while keeping my host system clean and free of any installations of the
Serverless framework. Changing versions of Serverless is a one-line change in the Makefile
.
If you try this out and it works or you see some improvments please let me know!