Back to Home
Engineering5 min readDec 28, 2025

From System Freeze to 250ms: Building a Scalable Code Runner

When I set out to build Neuron, I thought the hard part would be parsing code. I was wrong. The hard part was huge concurrency spikes crashing my server.

Here is the story of how I crashed my own server, successfully engineered my way out of it, and cut execution time from 2.5 seconds to 250ms.

"Just spin up a Docker container for every request! It'll be fine."
— Me, moments before disaster
Moments before disaster meme

1The Naive Approach

My initial architecture was simple (and efficient... or so I thought):

  1. User sends code via API.
  2. API spins up a brand new Docker container.
  3. Code runs, output is captured.
  4. Container is destroyed.

It worked beautifully for local testing. I felt like a genius.

The Crash 💥

Then, I decided to run a stress test. I fired up Apache Benchmark and sent 1,000 concurrent requests.

Result: System Freeze

The kernel panicked trying to spin up 1,000 Docker containers simultaneously. The CPU usage hit 100%, memory was swallowed whole, and requests started timing out left and right.

Lesson Learned: Unbounded concurrency is a death sentence. You cannot simply "spin up" resources on demand at scale. You need backpressure.

2Why Not Just Run Synchronously?

You might ask: "Why queue at all? Why not just execute and return?"

🚫 Blocked Connections

If execution takes 2 seconds, that HTTP connection is open for 2 seconds. With 1,000 users, you exhaust file descriptors instantly.

🌊 No Backpressure

If traffic spikes to 5x capacity, a synchronous server crashes immediately. An async server just has a longer queue.

3The Queue: Kafka vs. Redis

Kafka vs Redis meme
🐢

Attempt A: Apache Kafka

My first instinct was "Enterprise Scale™".

  • Handled throughput easily
  • 700ms - 1000ms overhead just to queue
⚡️

Attempt B: Redis Streams

I stripped out Kafka and implemented Redis Streams.

Latency dropped to ~3ms

4Scaling the Execution Core

Finally, the bottleneck moved to the worker. I tried three strategies:

Attempt 1: Spin Up On Demand

FAILED

Tried to boot 1,000 OS processes at once. Kernel panic. Server melted.

Attempt 2: Capped Concurrency

TOO SLOW

Limited to 50 workers. Safe for server, but created 50-second wait times for users during spikes.

Attempt 3: The "Pre-Warmed" Pool

SOLVED

Treat containers like database connections. Boot 50 containers before traffic hits. Pause them. When a job comes, unpause an existing one.

Old Startup Time
~2,500ms
New Startup Time
~0ms

Handling "Dirty" State

Reusing containers introduces state pollution. We solved this with a strict health policy:

  • Isolation: Containers are locked down (no network, limited disk).
  • Dirty Checks: If a container returns TLE or OOM, it is marked "Dirty" and destroyed immediately.
  • Freshness: Periodic health checks ensure the pool never goes stale.

Want to break my server?

I've opened the Public Beta. First 1,000 users get a special bonus.

Get 200 Free Executions