C++ Rate Limiter

High-performance token bucket with sharding, precise fixed-point math, and production-grade packaging.

Tech: C++20, CMake, Google Benchmark, gtest

GitHub

Try it: burst vs. refill

Key

Capacity

Refill tokens / sec

Requests to send

Allowed: 0 / 15

Denied: 0 / 15

I built this library to solve a practical problem I kept running into: adding fast, predictable, and portable request throttling inside services without dragging in heavyweight dependencies. Many rate limiting solutions live behind network proxies or require external stores; that works in some architectures, but it adds moving parts and latency. I wanted an in-process library that an application could link directly, with a clear API, minimal overhead, and a focus on correct concurrency. The result is a tiny C++20 rate limiter with a sharded token-bucket core, precise fixed-point arithmetic for refills, clean C and C++ interfaces, and first-class CMake packaging so it drops into downstream builds with `find_package`.

At its heart is a classic token bucket. Each key has a capacity (max burst) and a refill rate (tokens per second). On each check, the limiter refills by elapsed time, subtracts one token if available, and returns a decision struct: allowed/denied, remaining tokens, and an estimated reset_ms until a token is available. Two design choices make it scale on CPUs: sharding and fixed-point math. The key space is split across shards by FNV-1a hash, which reduces lock contention substantially under mixed workloads. Within a shard, the data structure uses shared vs. unique locks so reads are cheap and inserts are safe. Refill math uses Q32.32 fixed-point counters to avoid precision drift and floating-point surprises, especially under very small or very large refill rates. Tests validate overflow safety and time calculations across ranges.

This library is intentionally “packaging-first.” It ships with install rules that generate `RatelimitConfig.cmake` so consumers can just `find_package(Ratelimit CONFIG REQUIRED)`. It also optionally produces a `ratelimit.pc` for pkg-config users. The public C++ API is header-only for ergonomics, while the core implementation and the C shim are built as libraries. Visibility is restricted by default: symbols are hidden unless explicitly exported via an `RL_API` macro, and Windows builds use the expected `__declspec(dllexport/dllimport)`. That means clean, predictable ABI boundaries across platforms.

# Build and install (Release)
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
ninja -C build
ctest --test-dir build --output-on-failure
cmake --install build --prefix "/your/prefix"

Using the library from CMake is straightforward. Point `CMAKE_PREFIX_PATH` to your install prefix or set `Ratelimit_DIR` directly. The package exports logical targets so you can choose either the static core (`Ratelimit::ratelimit_core`) or the shared C++ library (`Ratelimit::ratelimit`). The exported headers live under `include/ratelimit`, and downstream code just includes `<ratelimit/rate_limiter.h>`.

# consumer/CMakeLists.txt
cmake_minimum_required(VERSION 3.16)
project(consumer LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

find_package(Ratelimit CONFIG REQUIRED)
add_executable(consumer main.cpp)
target_link_libraries(consumer PRIVATE Ratelimit::ratelimit_core)

// consumer/main.cpp
#include <ratelimit/rate_limiter.h>
#include <iostream>

int main() {
  rl::RateLimiterInProc limiter;
  rl::Limit lim{10, 5};  // capacity=10, refill=5 tokens/sec

  auto d = limiter.allow("user:42", lim);
  std::cout << (d.allowed ? "allowed" : "denied")
            << " remaining=" << d.remaining
            << " reset_ms=" << d.reset_ms << "\n";
  return 0;
}

There is also a minimal C API for embedding into C programs or FFI scenarios. The C interface exposes version accessors and a function table that mirrors the allow/check path. This lets non-C++ consumers (or JNI bindings) link against a stable C surface while the C++ internals evolve. The build system can optionally produce a JNI target for Java consumers; it follows the same token-bucket semantics and threads through the exported C functions under the hood.

#include <ratelimit/ratelimit_c.h>
#include <stdio.h>

int main() {
  printf("%s\n", rl_version());
  // Additional C wrappers for allow/check would be called here.
  return 0;
}

Design-wise, several details matter for performance and correctness. Sharding reduces global contention by distributing keys; hot keys still contend within a shard, but the rest of the workload proceeds without interference. The internal map is tuned for the read-mostly path, and the time source is abstracted behind a `Clock` so tests can be deterministic. Fixed-point arithmetic (Q32.32) is a pragmatic compromise: integer math with fractional precision avoids floating point rounding drift while keeping operations cheap and branch-predictable. The refill operation clamps at capacity, guards against negative deltas, and uses overflow-aware additions. On top of that, the decision object exposes `reset_ms` so callers can build friendly retry-after behavior without guessing.

To validate behavior, the test suite uses GoogleTest for unit tests across edge cases: zero/near-zero refill rates, large capacities, repeated allow calls under multi-threaded contention, and mock clocks that simulate time jumps. Thread safety tests involve concurrent producers hammering the same key and distinct keys; assertions verify that per-key invariants hold and that the sum of allowed requests does not exceed capacity plus refills in any window. This combination of invariants, mockable time, and sanitizer options (TSAN builds are supported where available) caught most bugs early, especially in the tricky interactions between refill math and lock scopes.

# Enable microbenchmarks (Google Benchmark)
cmake -S . -B build -G Ninja -DRL_BUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release
ninja -C build rl_bench
./build/rl_bench

On performance, the optional Google Benchmark suite provides apples-to-apples comparisons under different key distributions (uniform vs. hot-key) and thread counts. Example outputs on commodity CPUs have shown single-key checks on the order of hundreds of nanoseconds and multi-million operations per second when keys are well-distributed across shards (your numbers will vary by CPU and compiler flags). The goal isn’t to beat every specialized limiter but to be “fast enough” that the limiter itself isn’t the bottleneck for typical service rates. If you need more scale, you can increase shard count, pin shards to cores, or add batched check APIs to amortize lock acquisition.

From a usability standpoint, the library ships with a tiny CLI example so you can smoke-test behavior interactively. After building, run the CLI and type keys; you’ll see allowed/denied decisions with remaining tokens and reset times. This is handy for demos and for sanity-checking rate configurations before wiring them into a larger system. In real services, you’d place the limiter at ingress points like HTTP handlers, message consumers, or internal function calls that fence costly work. Because the API is in-process and lock scopes are tight, the end-to-end overhead on the allow path is small.

# Example CLI usage
./build/rl_cli
Enter keys (Ctrl+D to exit). Using capacity=10, refill=5/s
foo
foo -> allowed=yes remaining=9 reset_ms=0

# Or pipe input:
yes foo | head -n 15 | ./build/rl_cli

Building a portable C++ library always brings cross-platform challenges. The biggest were symbol visibility and Windows DLL semantics. Hiding all symbols by default and explicitly exporting the public API via `RL_API` keeps the ABI stable and avoids accidental leaks that break consumers later. On Windows, the install step lays out both import libraries and DLLs; the README documents that `bin` must be on `PATH` at runtime. Another challenge was picking the right lock strategy: shared locks speed the read path, but you must still ensure fairness for writers under hot keys. Finally, refill precision required careful thought: wall-clock jumps, timer granularity, and integer rounding can compound into surprising behavior. The fixed-point math and mockable clock address those concerns without sacrificing speed.

There’s plenty of room for extension. A sliding-window or leaky-bucket variant could complement token bucket for different semantics (e.g., hard caps per window). Batched APIs could lower contention under hot keys by checking multiple keys in a single shard lock scope. A background refiller could reduce work done on the allow call at the cost of periodic maintenance. And the JNI bridge can be promoted into a first-class deliverable for JVM users. But even in its current form, the library is a practical, well-packaged building block: small surface area, predictable performance, and the right hooks for tests and ops. That combination makes it easy to adopt and even easier to explain during an interview or a code review.