Rate Limiting, the Basics

Pre-requisites

Here are some things you should know, that will make this post easier to understand.

Over the last few months I’ve been working on an internal with Schneems on the Heroku Platform API gem, we introduced rate throttling into the gem, to compliment the rate limiting that was already in use. You can read about our work in detail on the Heroku Blog. In this post, I wanted to go over the basics of rate limiting and why it’s important, especially for APIs that see a lot of traffic.

The problem

When you expose an API’s endpoints by making it public, you’re also inevitably going to increase the amount of inbound traffic to the API, which is what you probably want, however this can introduce a new set of problems. If your server can’t handle the load of the increase in traffic it’ll become slow and lag making for a poor user experience. More seriously, you could be opening yourself up to Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks, these attacks work by flooding the server with requests (aka traffic) which then overwhelms it.

Let’s say I’d like to display my tweets in a sidebar on my website. Somewhere in my code I’d be sending a request to the Twitter API endpoint that fetches tweets and I may be doing that every minute to make sure the tweets displayed on my site are up-to-date. In an hour I’ve made 60 requests, in a day I’ve made 1440 requests. There are 330 million active twitter users, if every single user was doing this, that’d be four hundred seventy-five billion two hundred million requests everyday & maybe Twitter’s servers can handle that but if they can’t then this could easily lead to Twitter’s resources (their computing power, computing memory, etc) being starved.

The problem of DoS attacks and too many people requesting an endpoint is the same, they both work by putting too much load on the server’s resources which leads to bad performance for the application and a rubbish experience for the user.

How does rate limiting help?

Rate limiting is basically a way to limit the amount of requests that a user session or IP address (which are both usually unique) can make to an API or an API endpoint. For example, one of Twitter’s rate limiting rules is that an IP address or user session can only make 15 requests in 15 minutes. If a 16th request is made during the 15 minute window, it will be blocked. Many APIs have similar strategies, at Heroku, you’re limited to 4500 calls per hour. The goal of all rate limiting algorithms is to try to ensure that no one is trying to use more resource than is available to them.

There are a number of different strategies to implement this, and making a choice between one or the other really depends on the application. At Heroku we use a token bucket strategy, every Heroku Account starts with a full bucket of 4500 tokens, every time an account makes an API call a token is removed from the bucket. However, tokens are also added back to the bucket every minute, roughly 75 tokens per minute. As an account, you don’t want to use more tokens than are being added to you bucket because that will lead to you hitting the limit since you’d reach a point where your bucket is empty.

As I mentioned before, it’s important for applications to think about rate limiting especially if the application is public facing and expects many users. If you don’t have a public facing API then you’re more likely to be concerned with preventing DoS attacks and there are many services that can take care of DoS prevention at a fee, so you may not need to implement your own rate limiting algorithm.

Further Reading