← All Case Studies
02AUTOMATIONSaaS Company · 2022

Automating the Ops Team Away

2022SaaS Company

6 people spending 30+ hours a week on manual data work. There was a better way.

STACK

PythonCeleryRedisPostgreSQLNodeJSDockerAWS

01 — THE PROBLEM

The operations team at this SaaS company had a problem that felt normal to them because it had always been that way: six people spending 30–40 hours a week pulling data from five different vendor APIs, reformatting it into spreadsheets, sending reports, flagging anomalies, and hunting down mismatched records. It was skilled work being done by skilled people — but it was entirely automatable. The real cost wasn't the salary hours. It was the errors. Manual processes at scale mean manual errors at scale.


02 — THE APPROACH

I started by shadowing each person through their workflow for a full week. No code, no architecture — just documenting every click, every copy-paste, every "I always check X before doing Y" rule that existed only in someone's head. This alone produced a 14-page process document that the team had never had before.

From that document, I identified 4 distinct workflows that would cover 85% of the team's time. I scoped each as an independent service, not a monolith. This was important: operations workflows have a habit of evolving independently. Coupling them was the fastest way to create technical debt as requirements changed.

Each workflow was built as a Celery task queue backed by Redis, with configurable schedules and a simple admin UI (built in React) that let non-engineers adjust parameters, pause runs, and inspect failures. The failure handling was deliberately over-engineered: every task failure sent a Slack alert with full context, and partial failures were designed to be retryable without re-running completed steps.


03 — THE OUTCOME

Delivered in 11 weeks across 4 phases. Each workflow went live independently with a 2-week parallel run where both the manual process and the automated one ran simultaneously — output was compared daily before the team signed off on the automation.

Manual ops hours/month
480+ → 4
Data processing errors
−94%
Report delivery time
2 days → 12 minutes
Team redeployed to
Product & growth work

THE TAKEAWAY

Automation projects succeed or fail in the discovery phase. If you skip the "shadow the actual humans" step and jump straight to architecture, you automate the wrong thing and spend twice as long fixing it. The goal isn't to remove people — it's to remove the work that prevents people from doing the work that matters.

Share this
← ALL CASE STUDIESWORK WITH ME →