Serverless Design Principles // Kevin Wu

Introduction

Serverless is becoming more and more popular these days. It has always been an interesting space for me because I'm interested in the stateless functional style of programming. I started working on Lambda functions essentially as my first project as an SDE at Amazon back in 2017, worked with one of our purely serverless data storage services at Amazon Fashion, eventually made my way to the actual Lambda backend team, and now at Microsoft I find myself working with Azure Functions again. Obviously, Lambda has not really existed for that long, so I think I've basically maxed out on possible number of years of Serverless experience. I hope to highlight some of the more interesting differences between more "classic" design with servers and serverless design.

Motivations

I wanted to start with why we would use Lambda/Functions over classic VMs or Kubernetes clusters. The original motivation for Lambda was mostly to save costs, but later on we noticed some efficiencies that could only be realized at Lambda scale.

Saving costs

There are a lot of web services out there that generally serve less than 1 transaction per second, but were still costing a lot in VM usage. The idea was that there should be an efficient way for us to schedule a lot of these web services on the same hardware so that we can save a lot of money.

Efficiencies at scale, minimizing resource contention with data

After running Lambda for a while, Lambda realized there were ways that we could schedule work really efficiently. Lambda has the data to analyze different workflows to see what their bottleneck resource is, and schedule them such that the functions would not have to contend for the same resource at the same time. For example, it would be optimal to schedule a memory-heavy workflow with a compute-heavy workflow, so they are much less likely to contend for the same resources. There is an excellent talk by Marc Brooker about this you can find here https://youtu.be/xmacMfbrG28?t=1310.

It's just easier

I'd be remiss if I didn't include this, but a lot of the time, Lambda and its ilk are just the easiest services to set up, requiring little knowledge of server infrastructure, and that makes it much easier to use to a broad audience. Last year, I threw together a demo of a bookstore to give a talk on design for the CS department at my alma mater, and it was just easier to use API Gateway backed by Lambda and DynamoDB, so I didn't have to really think about servers at all.

Key ideas

These are some key ideas that will come up repeatedly in our best practices. I'm highlighting these specifically because they differ from traditional "serverful" architecture.

Concurrency

We use the number of concurrent invocations to talk about the scale of a Function, not requests per second or capacity, which is more traditional. You can calculate your concurrency by multiplying your requests per second by the expected latency of the request, see Little's Law. There is actually quite a lot of content in Marc's talk that I linked to earlier about this if you want a deeper dive on why this is the case. The main reason is that concurrency is a measure that doesn't depend on hardware (hence serverless right?). Concurrency also takes into account how efficiently you're responding to requests rather than just how many requests you're getting. One common ticket I'd see at Lambda would be a team wondering why they were being throttled when they had low requests per second (more on this later).

Cold starts

The big issue with serverless that people like to talk about is higher latency from cold starts. A cold start¹ is essentially when your execution environment needs to both prepare for the execution, and then actually perform the execution. At Lambda, we called these stages Init and Invoke. It's not uncommon to see cold starts that are over 10 seconds, especially if you're not careful. I've also seen many tickets about this in my time at Lambda.

Quick detour on code/data reuse.

I also wanted to include a quick interlude about how persistent resources get reused because I think it's practical for those new at writing these Functions.

Class members

We talked a little bit about init and invoke. Intuitively, you'd think well I'm creating some resources, let's say a DB client, during the init phase, but how does it get reused? Do I instantiate a new client per request, or can we reuse resources across invokes? The answer, thankfully, is that you do in fact reuse these resources across invokes. If you initialize your DB client outside of your function definition, it will be reused the next time you invoke, so members of your class probably behave as you would expect coming from a more serverful background. Let me illustrate this with a quick example.

class Function:
    db_client = None

    def handle_invoke(event, context):
        if db_client is None:
            db_client = DbClient()

        return db_client.get_some_stuff()

On your first invoke, you will instantiate a new DbClient, but on subsequent requests to the same execution environment, db_client will continue to already be instantiated, so you will be able to get_some_stuff directly.

The more perceptive readers may notice a potential problem here. If you're following security best practices using ephemeral credentials that expire after some hours, you could experience credentials issues by reusing these clients. If you're using the AWS SDK, this will be generally handled for you, but otherwise it's something to keep in mind as you're developing.

File System

This is a quick one. If you use EFS with Lambda or by default with Azure Functions, your invokes all share the same file system. This was originally meant for machine learning workflows, where people have relatively large training models they want to load, but has plenty of other uses.

Best practices

Here are a collection of best practices that I found to be less commonly reported, but are really helpful with design.

Keep functions short (like less than 2 minutes)

If you read the Azure Functions best practices, they say that you should keep your functions short because they timeout. This is true, of course. Lambda used to time out at 5 minutes, but now has extended to 15 minutes. So you might think, well, if I limit my function to 10 minutes, I should be pretty safe. Unfortunately, there is more nuance here. Earlier, I introduced the concept of concurrency and Little's Law.

L=λW

L is the concurrency. λ is the effective arrival rate (requests per second), and W is the wait time (latency).

For example, the largest throttling point for Lambda in one of the regions is 3000 concurrent invokes. If you consider a 10 minute function (600 seconds), we can calculate the λ, or requests per second at which you'll be throttled.

3000=λ*600

λ=3000/600

λ=5

Oops, just 5?

From the equation itself, you can see that the higher the latency, the lower the requests per second your function will be able to handle. Even using the default numbers, the λ decreases unintuitively quickly as W increases.

Generally speaking, Lambda is designed to execute short functions, and other tools such as AWS Batch or ECS are better suited for longer running jobs.

Cold start related

Let's dig a little deeper on the steps to init. Generally speaking, no matter the serverless environment, we have to do these things.

Acquire an execution sandbox
Pull the code/executable into the sandbox
Start the executable runtime e.g. JVM or CLR
Run your init code.

Keep functions small (like under 50MB)

We've also already talked about cold starts. One of the most unintuitively slow parts of a cold start is actually pulling your code/binary. I suggested the limit of 50MB or so mainly because lots of people insist on using Java or C# to write functions, but in reality if you're using an interpreted runtime like python or node, you can easily keep your code under 1MB. I mean in this case Java and C# will be like 50x slower, and that's just to download the executable.

Use an interpreted runtime for more predictable results

While we're on the topic of runtimes, try to use a fast runtime. JVM and CLR have a reputation of taking a long time to initialize. I would routinely see such functions take upwards of 10 seconds to initialize. While JVM and CLR languages generally execute faster than node or python, (if you're following the earlier advice about function duration), you are spending a much higher proportion of your time in init causing more latency instability when you do hit a cold start. In my experience, node has been a good choice for having a more consistent experience. You can also bypass the runtime completely and pick a compiled language like Golang or Rust². I've also been on teams that used Golang to great effect with Lambda as well.

Be mindful of price

It's easy to get lost in the ease of using Functions and forget how much you're spending. I've accidentally spent thousands of dollars (of Amazon's money) in just a few days. The pricing model is a constant cost per invoke and then a rate on GBs, where GB is how much memory you've allocated (not how much you're using) and seconds of invocation rounded to the nearest millisecond. This means the less you use, the less you pay, but also conversely, the more you use, the more you pay. It'd be good to look up how much it would cost to get a VM or Kubernetes to do the same job and make sure you're willing to pay the excess. I've found that the point where Lambda starts costing more happens at a much lower concurrency than people generally think. Of course, functions do more than VMs. They scale automagically, do OS patching, etc, so it may be worth it to you, but you should at least know how much you're paying for that.

Assorted Tips

Lambda currently doesn't charge for inits under 10s

Lambda doesn't really like people spreading this particular tidbit, but this is relatively widely known now. In order to optimize cold start times, your init phase is generally run on a more powerful sandbox and you don't have to pay for it. The caveat is that if you spend more than 10 seconds in the init phase, the sandbox is restarted and you will be charged for init. There is a blog post that went relatively viral about this phenomenon: https://hichaelmart.medium.com/shave-99-93-off-your-lambda-bill-with-this-one-weird-trick-33c0acebb2ea.

Init can happen before you actually make an invoke request

This one surprises customers from time to time, but in order to optimize cold starts, the platform can initialize your execution environment well before you actually make an invoke. Especially if you make use of the dependency injection features. Reworking the previous example a little bit to show a common way that this comes up:

class Function:

    def __init__():
        self.logger = SomeCustomLogger()
        self.db_client = DbClient()
        logger.logInfo("db_client is initialized")

    def handle_invoke(event, context):
        return db_client.get_some_stuff()

If we log the initialization of db_client, you can see that log statement, even if you did not make a request.

Don't use Task without returning something in Azure Functions (when you can)

This is more off a quirk of how async/await works in dotnet, but plain Task functions are syntactic sugar for void. But since execution metadata is saved in the Task object, if you use just plain Task, you return void and you lose all your execution metadata. For example, if you return Task, all Exceptions are suppressed because there is no way to return them to the caller. For that reason, I suggest at least returning something like Task<bool>, which will throw Exceptions as expected.

Actually, within Lambda we had several levels of cold starts, which I get into a little bit later, but we're keeping it simple for now.

Shoutout to https://github.com/awslabs/aws-lambda-rust-runtime. But probably you'll have issues with Rust's really large compiled binaries.