A player kills three enemies in quick succession. The first two stat updates persist. The third returns 429 and the client silently fails it. From the player's pov, that's a kill that didn't count, an achievement that didn't unlock, and a reason to write a negative Steam review. That's what bad rate limiting looks like.
Good rate limiting is invisible. Limits exist, but the client absorbs them, batches around them, and recovers without the player noticing.
This article covers what to actually build, and what to expect from a backend you don't build yourself, so your rate limits stop bots and runaway clients without dropping legitimate players.
What Rate Limiting Solves and Where it Hurts
Rate limiting is a counter. It tracks how many requests a given entity (user, IP, API key, studio) has sent in a window of time. Anything over the threshold gets rejected, usually with an HTTP 429 Too Many Requests response. The point is to protect your backend from three things: malicious abuse (bots, DoS, credential stuffing on login), buggy clients hammering an endpoint in a tight loop, and legitimate-but-runaway traffic during launch spikes that would otherwise melt your database.
That's the protection side. The cost is that every limit you set is a potential player-experience bug. Here are four examples of it goes wrong in multiplayer games:
The core trade-off: tight limits make abuse cheap to block but punish bursty real traffic. Loose limits leave you exposed. Game traffic is bursty by nature (login storms, match-end stat flushes, lobby joins) so the design has to handle bursts cleanly without permanently raising the ceiling.
Where Rate Limiting Needs to Live
A common mistake is putting all the limits in one place. Rate limits work best as a defense in depth, with three layers each catching a different kind of abuse:
Algorithm choice matters too. For backends, a token bucket is likely the right call.
Token bucket in 30 secs: A bucket refills at a fixed rate (say, 10 tokens per second). Every request consumes one token. The bucket has a max capacity (the "burst size") and when it's empty, requests get rejected. That's what lets a player fire 8 stat updates after a multi-kill without tripping the limit.
It allows controlled bursts (a player firing 8 stat updates after a multi-kill) while enforcing an average rate over time. Fixed-window counters have a known problem where a user can fire 2 * limit requests in two milliseconds at the boundary, which is fine for a slow-traffic API and catastrophic for a competitive shooter.
Sizing Limits That Don't Punish Real Players
The numbers matter and they're game-specific. Generic API guidance ("100 requests per minute is fine for most APIs") doesn't survive contact with a session-based shooter. Three rules of thumb that hold up in practice:
Concrete starting numbers for a small-to-mid multiplayer game: 500 RPM per user is generous and covers nearly all legitimate gameplay patterns. 5,000 RPM as a studio-wide ceiling is reasonable for a game with a few thousand concurrent players. These are starting points and they should grow with your CCU. Most managed backends use numbers in this range as defaults.
Handling 429 on the Client Without Breaking Play
The server side stops the abuse. The client side stops the player from rage-quitting. Both have to be right. When the client gets a 429, four things have to happen:
The client should also tell the player something honest when an action does fail. Not "Error 429" but "Reconnecting..." or "Action queued." Silent failures make rate limiting feel like a broken game.
Your Options for Adding Rate Limiting to a Game Backend
There are four meaningful paths. Each has a real trade-off, not just a feature list.
Full control over the algorithm, the keys, and the response shape. You'll have to implement a token bucket, distribute it across instances, handle clock skew, and write the client retry logic. Pricing: free in code, but the engineering cost is real and ongoing.
Mature token-bucket and leaky-bucket plugins, standard 429 responses with X-RateLimit-* headers, multi-dimensional limits (per consumer, per route, global). You still operate the gateway and tune its config. Pricing: free open source; commercial editions vary.
Fully managed, scales automatically, integrates with the rest of your cloud stack. Less flexibility on custom keys or game-specific algorithms, and you pay per request. Pricing: usage-based, gets expensive at high CCU.
Handles IP-level abuse before it reaches your origin and is very effective for DDoS and credential stuffing. Doesn't replace per-user limits because it doesn't know who your authenticated users are, so this is a complement to a gateway rather than a substitute. Pricing: tier-based with generous free plans; paid plans scale with traffic.
These options handle the API plumbing. They don't know what a session, a stat, or a chat room is, which means you still have to build the game-specific layer (chat spam rules, match-action cooldowns, batch endpoints) on top. AccelByte ships a backend that already has rate limiting designed for game traffic. The per-user, per-studio, per-endpoint, and per-feature layers are all configured out of the box.
How AccelByte Handles Rate Limiting
AccelByte is a game backend platform that handles accounts, matchmaking, sessions, lobby, chat, stats, commerce, and the rest of the live-ops layer. Rate limiting is built into the gateway and into the individual services. Defaults are tuned for game traffic, and every limit can be configured per game.
{
"error": {
"message": "You have exceeded the allowed request limit. Please try again later."
}
}
Combined with HTTP 429 status, this is enough for any standard client retry logic to handle. The AccelByte SDKs handle reconnection and retry on Lobby and other WebSocket services out of the box; for direct REST calls you still own the retry logic.
To plug AccelByte's rate limiting into your game:
Before Launch: What to Check
Things to pin down before launch, regardless of the backend you're using:
Start for free with AccelByte: Test Rate Limits Against Real Traffic
You can plug AccelByte into your game today and see how the rate limits behave under your actual traffic. AccelByte Gaming Services is free during development and comes with a 90 day trial (or 25,000 player hours, whichever comes first) after which usage-based pricing on peak concurrent users applies if your game is live. You can adjust the default limits with your account manager whenever your traffic warrants it.
Get Started for Free or Talk to us