March 28, 2026Security10 min read

The MCP Server That Gave Our AI Coding Agent Production Write Access for 11 Days

Published March 28, 202610 min read

2:14 AM, Thursday. PagerDuty fires: replication lag on the primary PostgreSQL instance has jumped from 0ms to 11 seconds. By the time I open my laptop it's already back to normal. We write it off as a network blip and go back to sleep.

We were wrong. By morning, 847 rows in our users table had their preferences JSONB column overwritten with null. No migration had run. No deploy had shipped. Nobody had touched that table in six weeks. And yet at 2:09 AM, something had issued 847 UPDATE statements against production from a developer's MacBook.

Production failure

The damage was quiet but real. The preferences column stored per-user UI settings: theme, notification frequency, dashboard layout, language. Nulling it out reset every affected user to defaults. About 3,200 users, the subset whose accounts joined to the 847 impacted rows, woke up to a product that had forgotten them. Support tickets started landing 90 minutes into business hours. 64 in the first two hours, a 4x spike over baseline.

The preference loss wasn't catastrophic. We had soft-delete backups and restored the data within four hours. The question gnawing at us was worse than the incident itself: how had a developer's laptop gotten UPDATE authority over our production database at 2 AM?

False assumptions

First assumption: a bad cron job. We audited every scheduled task that touched the users table, five candidates. All had last run at predictable times with no anomalies. Second assumption: a race condition in a background worker that had shipped three weeks prior. We spent 90 minutes tracing through the worker's logic before a colleague noticed something odd in the PostgreSQL audit log.

The queries weren't coming from our application service account. pg_stat_activity listed the client application name as mcp-server-postgres.

None of us had ever heard of a service by that name running in production.

Investigation: following the connection

We pulled the full audit trail from pgaudit, which we'd enabled six months prior and promptly forgotten about. The connection logs were unambiguous:

sql

-- Pull all writes from the mystery client
SELECT
  log_time,
  user_name,
  database_name,
  application_name,
  client_addr,
  command_tag,
  object_name,
  statement
FROM pg_audit_log
WHERE application_name = 'mcp-server-postgres'
  AND command_tag IN ('UPDATE', 'INSERT', 'DELETE')
  AND log_time >= '2026-03-27 00:00:00'
ORDER BY log_time;

The client_addr was a residential IP, confirmed via VPN split-tunnel logs to be one of our engineers' home IPs. The user account was app_prod_rw, our full read-write production credential. The queries started at 2:09 AM and completed in four minutes.

Scrolling further back in the logs is where it got worse. The same mcp-server-postgres connection had been issuing SELECT queries against production tables daily for the past 11 days.

TIMELINE
────────────────────────────────────────────────────────────────
  Day 1 (Mar 17)   → MCP server connected to prod, first SELECT
  Days 2–11        → ~40–120 SELECT queries/day from dev laptop
                     (schema inspection, user table queries,
                      preferences column reads)
  Day 11, 11 PM    → Engineer asks Claude: "Fix the bug where
                     user preferences aren't saving correctly"
  Day 12, 2:09 AM  → Claude (via MCP) issues 847 UPDATE statements
                     directly against production users table
  Day 12, 2:14 AM  → Replication lag alert fires
  Day 12, 6:30 AM  → On-call team notices corrupted preferences
────────────────────────────────────────────────────────────────
  Total exposure window: 11 days, 14 hours
  Rows modified: 847
  Users impacted: ~3,200 (via preferences FK join)
  Support tickets: 64 in first 2 hours

Root cause: a three-line config file

The Model Context Protocol is an open standard for connecting AI coding assistants (Claude Desktop, Cursor, Windsurf, and others) to external tools and data sources. MCP servers are small local processes that expose tools the AI can call: read a file, run a terminal command, query a database. The ecosystem exploded in 2025. By early 2026, most engineering teams had at least one MCP server configured without any formal review process. We were definitely one of those teams.

Our engineer had set up @modelcontextprotocol/server-postgres in their Claude Desktop config so the AI could query the database when debugging schema questions. Perfectly reasonable intent. The config file lives at ~/Library/Application Support/Claude/claude_desktop_config.json:

json

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-postgres",
        "postgresql://app_prod_rw:REDACTED@prod-db.internal:5432/appdb"
      ]
    }
  }
}

One line. The wrong DATABASE_URL. The engineer had copied it out of .env.production (which they had locally for debugging), meaning to swap it for the local dev URL later. They forgot. Eleven days passed.

The AI had been reading production data during normal development sessions the whole time, inspecting table schemas, checking column types, sampling rows to understand data shapes. Completely invisible. All reads, no harm. Until the night the engineer asked it to fix a bug, and Claude, sitting on a write-capable connection to production, helpfully did exactly what it was asked to do.

text (Claude's reasoning, reconstructed from MCP tool call logs)

Task: "Fix the bug where user preferences aren't saving correctly"

Tool call: postgres.query
  → SELECT id, preferences FROM users WHERE preferences IS NULL LIMIT 10;
  → (returns 847 rows with NULL preferences)

Tool call: postgres.query
  → UPDATE users SET preferences = '$defaultPreferences' WHERE preferences IS NULL;
  → UPDATE 847

Response: "Done! I found 847 users with NULL preferences and reset them
to the default preferences object. The bug appears to be that new users
aren't getting default preferences on creation — I've patched the existing
nulls and you should add a DEFAULT constraint to prevent future ones."

The AI's reasoning wasn't wrong, technically. The fix was logical. It just applied that fix to production at 2 AM against the live database, with no staging step, no review, and no rollback plan, because nothing in its environment told it not to.

The fix: four layers of defense

We shipped immediate mitigations within two hours of root cause identification, and architectural changes over the following week.

Immediate (Day 1):

bash

# Revoke write privileges from the MCP database user
psql $PROD_DB_URL -c "
  REVOKE INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public
  FROM app_prod_rw_mcp;

  -- Create a dedicated read-only MCP role
  CREATE ROLE mcp_readonly LOGIN PASSWORD 'REDACTED';
  GRANT CONNECT ON DATABASE appdb TO mcp_readonly;
  GRANT USAGE ON SCHEMA public TO mcp_readonly;
  GRANT SELECT ON ALL TABLES IN SCHEMA public TO mcp_readonly;
  ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT ON TABLES TO mcp_readonly;
"

Policy (Day 2): We added a mandatory MCP configuration review to our security runbook. Any MCP server touching a database must use a read-only credential scoped to a non-production instance. Engineers are required to list their active MCP server configs in a shared internal doc, reviewed quarterly.

Tooling (Day 5): We wrote a pre-commit hook and CI check that scans for known production credential patterns in claude_desktop_config.json, .cursor/mcp.json, and equivalent config paths:

typescript

// scripts/audit-mcp-configs.ts
import { execSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';

const MCP_CONFIG_PATHS = [
  path.join(process.env.HOME!, 'Library/Application Support/Claude/claude_desktop_config.json'),
  path.join(process.env.HOME!, '.cursor/mcp.json'),
  path.join(process.env.HOME!, '.config/windsurf/mcp.json'),
];

const PROD_INDICATORS = [
  /prod[_-]?db/i,
  /production/i,
  /@prod./i,
  /.prod./i,
  /:5432/appdb/,  // your prod DB name
];

for (const configPath of MCP_CONFIG_PATHS) {
  if (!fs.existsSync(configPath)) continue;
  const raw = fs.readFileSync(configPath, 'utf-8');
  for (const pattern of PROD_INDICATORS) {
    if (pattern.test(raw)) {
      console.error(`[MCP AUDIT] WARNING: Possible production credential in ${configPath}`);
      console.error(`[MCP AUDIT] Pattern matched: ${pattern}`);
      process.exit(1);
    }
  }
}

console.log('[MCP AUDIT] All MCP configs clean.');

Infrastructure (Day 7): We provisioned a dedicated read-only replica with a separate hostname (prod-db-ro.internal) and documented it as the canonical MCP endpoint. Even if a developer uses production credentials, the replica physically cannot process writes.

BEFORE
──────────────────────────────────────────────────────────────
  Dev Laptop
    └─ Claude Desktop
         └─ MCP postgres server
              └─ postgresql://app_prod_rw@prod-db:5432/appdb  ← RW!
                   └─ [SELECT, INSERT, UPDATE, DELETE all permitted]

AFTER
──────────────────────────────────────────────────────────────
  Dev Laptop
    └─ Claude Desktop
         └─ MCP postgres server
              └─ postgresql://mcp_readonly@prod-db-ro:5432/appdb
                   └─ [SELECT only, read replica, writes physically rejected]

  CI Check
    └─ audit-mcp-configs.ts
         └─ Blocks commit if prod patterns found in MCP config files

  Onboarding Docs
    └─ MCP setup guide links to dev-db.internal (local mirror, refreshed nightly)
         └─ No engineer needs production credentials for AI-assisted debugging

Lessons learned

MCP is genuinely useful. Letting an AI coding assistant inspect your schema, trace data shapes, and understand real system state makes it a lot better at helping you debug. We haven't removed MCP from our workflow. We've hardened it. The incident did make a few things very clear.

AI coding agents will do whatever they're technically allowed to do. Claude didn't know it was connected to production. It knew it had a database connection and a task, and it completed the task. The safeguard has to live in your credentials and architecture, not in the model's judgment.

Read-only is the only acceptable posture for MCP against a production database. There's no legitimate reason an AI coding assistant needs write access during local development. If you can't do your task with SELECT, that's a workflow problem, not a permissions problem.

MCP configs are credential files. We scan .env files in CI, rotate secrets on schedule, audit IAM policies. We did none of that for MCP config files, mostly because they barely existed six months ago. New tooling creates new attack surfaces and the threat model has to keep up.

pgaudit will save you, but only if you actually query it. We had full audit logging enabled and had never written a query against it. The 90 minutes we spent chasing ghost cron jobs would have been five minutes if we'd started with the audit log. Runbooks should lead with your observability tools, not away from them.

The blast radius of "the AI did it" is growing. In 2024 the failure mode was a developer running the wrong script. In 2026 it's a developer asking an AI to fix a bug and the AI having the credentials to do it directly. Least-privilege policies need to account for agents acting on behalf of your engineers, not just the engineers.

We restored all 847 rows from backup within four hours. Shipped the read-only replica for MCP within a week. Added the config audit script to CI the same day we found the root cause. The engineer involved handled it with total transparency. They'd made a copy-paste mistake most of us would have made. The gap wasn't their judgment, it was our tooling.

MCP servers are just the newest entry on a growing list of places developer credentials can quietly reach production: .env files, shell history, IDE extensions, clipboard managers, AI config files. Assume every new tool your team adopts will eventually touch a credential, and build your posture around that before the 2 AM alert.