Works on My Machine

Updated 8 hours ago

"Works on my machine."

Few phrases frustrate more. Code runs perfectly in development, fails mysteriously in production, and the developer genuinely can't reproduce the problem.

This isn't just a meme. It's a category of bugs with a single root cause: invisible assumptions. Every "works on my machine" problem is something you didn't know you were depending on—until it wasn't there.

The Anatomy of an Invisible Assumption

Your code doesn't run in isolation. It runs on top of an enormous stack of context: operating system, file system, network, libraries, configuration, data, timing, and resources. You make assumptions about all of these, usually without realizing it.

When those assumptions hold, the code works. When they don't, it breaks. The challenge is that assumptions are invisible precisely because they're true in your environment.

Where Assumptions Hide

Configuration

Your machine connects to a local test database. Production connects to a distributed cluster. Same code, different database—different behavior.

Your PostgreSQL is version 14. Production runs 13. A query that works in 14 might behave differently in 13.

Your environment variables point to test services. Production's point to real ones. The code path might be identical, but the responses aren't.

Data

Your test database has hundreds of records. Production has millions. That query completing in 50ms locally? It times out in production.

Your test data is clean—carefully constructed, consistent, complete. Production data is messy: unexpected nulls, special characters, edge cases that violate assumptions you didn't know you'd made.

Infrastructure

You develop on macOS. Production runs Linux. File paths work differently. Case sensitivity works differently. System libraries work differently.

Locally, services talk to each other on localhost with sub-millisecond latency. In production, they're separated by networks, load balancers, firewalls, and geographic distance.

Your laptop has 64GB of RAM. Production containers have 2GB. Code that caches aggressively works fine locally and crashes production with out-of-memory errors.

Timing and Concurrency

One developer clicking buttons doesn't reveal race conditions. A thousand concurrent users do.

Mocked services respond in milliseconds. Real APIs across real networks take hundreds of milliseconds—or timeout entirely.

On a single machine, clocks are perfectly synchronized. Across distributed systems, time can skew. Code that assumes timestamps always increase might break when clock drift causes time to occasionally move backward.

Versions

You installed the latest library version last week. Production deployed months ago with an older version. Breaking changes in minor versions cause mysterious failures.

Python 3.11 on your machine. Python 3.9 in production. Most code works identically. Some doesn't.

Classic Scenarios

The case-sensitivity trap. Developer on macOS saves UserProfile.js, imports it as userprofile.js. macOS is case-insensitive by default—this works. Linux is case-sensitive—the import fails.

The accumulated latency. Code makes 10 sequential API calls. Each takes 200ms locally (mocked). Each takes 500ms in production (real network). Two seconds in development, five seconds in production, three-second timeout exceeded.

The race condition at scale. Code updates a counter without locking. With one user, requests never overlap. With a thousand concurrent users, requests constantly overlap. Updates get lost.

The resource leak. Code opens file handles and relies on garbage collection. Low request rate in development—fine. High request rate in production exhausts handles before GC runs.

The phantom dependency. Code imports a library that's installed globally on your machine but not declared in dependencies. Works locally, fails in production where it doesn't exist.

The hardcoded localhost. Code calls http://localhost:8080. Locally, the backend runs there. In production, localhost is the container itself—not the actual backend.

Making Assumptions Visible

The goal isn't to eliminate all assumptions—that's impossible. The goal is to make them visible and consistent across environments.

Containerization

Docker is the most effective tool. If your code runs in a container locally, it runs in the same container in production. Same OS, same libraries, same file system. The container makes your assumptions explicit.

Infrastructure as Code

Define your environments in version-controlled configuration. Terraform, Kubernetes manifests, CloudFormation—the tool matters less than the principle. Environments should be reproducible, not artisanal.

Configuration Through Environment

Keep code identical across environments. Let environment variables handle the differences: database URLs, API keys, service endpoints. The code doesn't change—only the configuration it reads.

Production-Like Data

Develop and test against sanitized copies of real data. Clean test data hides the edge cases that production data reveals.

Staging That Mirrors Production

A staging environment that actually resembles production catches environment-specific issues before they reach users. This means similar infrastructure, similar data volumes, similar network topology.

Test at Scale

Load testing reveals what single-user testing hides: race conditions, resource exhaustion, timeout accumulation. If you only test with one concurrent user, you only know your code works for one concurrent user.

The Production-First Mindset

The traditional approach: make production like development. The modern approach: make development like production from the start.

Develop in containers. Don't install dependencies globally—containerize your development environment.

Test with realistic data. Don't test only with clean, small, predictable datasets.

Build observability in. Rather than trying to reproduce production issues locally, instrument your code so you can diagnose issues in place. Structured logging, distributed tracing, metrics.

Use feature flags. Test new behavior in production with safeguards. Canary deployments let you validate with real traffic while maintaining the ability to roll back.

The Diagnostic Inversion

Here's the useful reframe: "works on my machine" is actually diagnostic information.

If code works in one environment and fails in another, you've just narrowed the problem. Something is different between those environments. Find the difference, find the bug.

Is it the data? The network latency? The concurrency level? The library version? The file system? The configuration?

The environment difference that causes the problem is also the key to solving it.

The Real Cost

Environment-specific bugs are expensive in non-obvious ways:

Wasted debugging time. Hours spent trying to reproduce something locally that can only happen in production.

Deployment anxiety. You can't be confident that tested code will actually work.

Slow incident response. Production issues that can't be reproduced take longer to fix.

Eroded trust. Users experience failures that "shouldn't" happen because testing passed.

Investing in environment parity pays for itself. Every hour spent making development match production saves multiple hours of debugging environment-specific bugs.

"Works on my machine" should be a red flag, not a shrug. Each occurrence is an opportunity: find the invisible assumption, make it visible, and ensure it holds everywhere your code runs.

Frequently Asked Questions About Works on My Machine

Was this page helpful?

😔

🤨

😃