Optimizing Nano ID Generation in Go: Concurrency, Memory, and Precomputation Strategies

#computing #distributed systems #software

“All problems in computer science can be solved by another level of indirection.” — David J. Wheeler

Identifier generation is one of those concerns that quietly disappears into the background of a system. At small scale, it is effectively free. A call to a random number generator, a string conversion, and the work is done. There is little reason to revisit it, and even less reason to question its design. This holds largely because identifier generation tends to be invisible: it sits at the edges of requests, happens quickly, and rarely shows up in profiling output. When it does, it is usually dismissed as noise.

That assumption holds only as long as the surrounding system remains small. As concurrency increases and identifier generation moves onto hot paths, what was once incidental becomes part of the system’s observable behavior. Identifiers are created everywhere: at request boundaries, inside storage layers, across services that never coordinate with one another. They are infrastructure in the most literal sense—pervasive, unavoidable, and relied upon precisely because they are assumed to be cheap. When they stop being cheap, the cost is paid everywhere.

NanoID fits neatly into this mental model. It produces compact, URL-safe identifiers with reasonable entropy and a simple interface, and in most environments it behaves exactly as expected. It is easy to adopt and difficult to misuse, which reinforces the idea that identifier generation is a solved problem. The difficulty is not with NanoID itself, but with the way its cost profile changes when it is exercised continuously and concurrently.

In Go, the most direct implementation of NanoID draws randomness from crypto/rand.Reader. From a correctness standpoint, this is beyond reproach. The operating system provides high-quality entropy, the guarantees are well understood, and the resulting identifiers are unpredictable in exactly the ways they should be. Under light use, the cost of doing so is effectively invisible, which further entrenches the assumption that there is nothing here worth examining.

Under sustained concurrency, that assumption no longer holds. Each call to crypto/rand crosses into the kernel, sources entropy, fills a buffer, and returns. Temporary buffers are allocated and discarded. None of this is problematic in isolation, but taken together and repeated at scale, it becomes the dominant cost of identifier generation. At that point, “generating an ID” is no longer a small operation. It is a collection of system calls, allocations, and coordination that happens to terminate in a string.

This is not a criticism of crypto/rand. It is doing exactly what it is designed to do. The issue is one of proximity. Entropy acquisition and identifier construction are tightly coupled, even though they serve different purposes. Correctness requires the former. Performance becomes sensitive when it occupies the hottest path of the system.

Concurrency has a way of making these relationships visible. Throughput flattens earlier than expected. Latency becomes uneven. Profilers begin to attribute meaningful time to what was assumed to be trivial. In practice, the profiler does not report time spent “generating IDs”; it reports time spent in the kernel, allocation paths, and garbage collection. Identifier generation becomes visible not because it is conceptually complex, but because it is structurally misplaced.

Separating Entropy from Generation

At that point, there are two broad directions a system can move. One is to accept the cost and design capacity around it. The other is to separate concerns that were previously implicit. The former tends to entrench friction around an otherwise unremarkable operation. The latter introduces an architectural boundary.

NanoID does not require fresh entropy for every identifier. What it requires is unpredictability. Those two properties are related, but they are not equivalent. Entropy must originate from a trusted source, but once acquired, it can be expanded safely using a cryptographically sound construction. There is no requirement to consult the operating system on every invocation, and doing so ties correctness to cost in a way that becomes increasingly visible under load.

This observation reshapes the implementation. Instead of treating identifier generation as a single indivisible operation, it becomes possible to draw a boundary between entropy acquisition and identifier construction. In practical terms, this means seeding a generator once from crypto/rand and then producing random bytes locally. The operating system remains the root of trust, but it no longer sits in the inner loop.

The Shape of the Hot Path

A stream cipher such as ChaCha20 fits this role naturally. It is designed to generate large volumes of pseudorandom output from a fixed key and nonce, and its properties are well understood. Once initialized, it produces random bytes without further system interaction. Used this way, it does not replace the operating system’s entropy source; it amortizes it.

Each generator instance is seeded once using crypto/rand and then used to satisfy multiple NanoID generations. The generator itself is not shared across callers. Sharing would introduce contention and obscure the very boundary the design is trying to establish. Instead, generator instances are treated as independent state machines that can be reused opportunistically.

This is where sync.Pool becomes useful, not as a micro-optimization, but as a way to preserve locality without introducing coordination. Generator instances are pooled so that goroutines can borrow them briefly, generate the required bytes, and return them. There is no shared mutable state and no locking around generation itself. The pool exists solely to reduce repeated setup cost when reuse is inexpensive.

Once entropy is decoupled, other sources of variability become apparent. NanoID generation requires temporary byte buffers—first for random data, then for mapping those bytes into an alphabet. These buffers are uniform in size and short-lived. Allocating and discarding them repeatedly introduces allocation pressure that is unrelated to the semantics of identifier generation. Pooling these buffers follows the same reasoning as pooling generator state: reuse when convenient, allow reclamation when not.

The same discipline applies to configuration. NanoID relies on derived values such as alphabet size, bit masks, and the number of random bytes required to generate an identifier of a given length. These values do not change per call, yet they are often computed as part of the generation path. Computing them once and fixing them at construction time pushes variability outward and keeps the inner loop small and predictable. The generation path becomes a straight-line transformation from random bytes to characters, without branching or recomputation.

At this stage, the hot path is deliberately unremarkable. There are no system calls, no locks, and no dynamic allocation beyond the final string itself. The work performed is exactly the work required to produce an identifier, and nothing else. What remains visible in benchmarks is not overhead, but the irreducible cost of string construction in Go.

The effect of these changes is straightforward to measure. Allocation counts collapse to a single allocation per identifier. Latency stabilizes. Throughput scales with available CPU until saturation. More importantly, behavior becomes predictable. Identifier generation stops appearing as a source of variance and resumes its role as infrastructure.

An implementation of the approach described here is available as an open-source Go module, along with a small command-line tool built on top of it:

What does not change is the security model. Entropy still originates from the operating system. Generator state is seeded from a cryptographically secure source. Unpredictability is preserved. There is no attempt to weaken guarantees in exchange for speed. The improvement comes from respecting boundaries, not from relaxing them.

Identifier generation stops being trivial when it becomes infrastructure. At that point, it deserves the same architectural treatment as any other component that occupies a hot path. Once those boundaries are made explicit, the problem largely resolves itself.

At that point, the choice of entropy source becomes an explicit design decision rather than an incidental one. The NanoID implementation described here does not assume a single generator, nor does it require that entropy be sourced directly from the operating system on every invocation. Instead, it is structured to accept a well-defined pseudorandom generator that is seeded from a trusted source and then exercised locally.

For environments where throughput and steady behavior under concurrency are the primary concerns, a ChaCha20-based generator provides a practical balance. Seeded once from crypto/rand, it offers high-quality pseudorandom output with predictable performance characteristics, making it well suited for systems where identifier generation sits on a hot path and must remain invisible.

In environments where regulatory or compliance requirements apply, particularly those that mandate FIPS 140-2 validation, an AES-CTR-DRBG construction becomes the appropriate choice. In that context, the same architectural separation applies: entropy is sourced from the operating system, expanded via a standards-aligned deterministic generator, and consumed locally. The difference is not architectural, but contractual—the guarantees are defined externally rather than operationally.

The important point is that these choices do not alter the shape of the system. Whether the generator is ChaCha20-based or an AES-CTR-DRBG, the boundary between entropy acquisition and identifier construction remains intact. The cost model is stable, the hot path is predictable, and the guarantees are explicit rather than implicit.


Implementation: https://github.com/sixafter/nanoid