The Mikado Method: Safe Refactoring for Complex Infrastructure

Every SRE has faced this situation: you need to upgrade a critical dependency, refactor a sprawling Terraform codebase, or migrate services to a new architecture. You start making changes, and within hours you are drowning in cascading failures. The code does not compile, tests are failing everywhere, and you have not committed anything in days.
This is the quicksand of complex systems. The Mikado Method is your escape route.
What is the Mikado Method?
The Mikado Method is a structured approach to making large, risky changes in complex codebases. Named after the pick-up sticks game, it helps you identify dependencies between changes and tackle them in the right order.
The core principle is simple: try, fail, learn, revert, repeat.
Instead of pushing through when you hit obstacles, you deliberately revert your changes, document what you learned, and address prerequisites first.
Why DevOps Engineers Need This
Infrastructure code has unique challenges:
- Terraform modules with dozens of interdependencies
- Kubernetes manifests where one change cascades through deployments
- CI/CD pipelines with complex conditional logic
- Configuration management code touching multiple environments
Traditional "just fix it" approaches fail spectacularly here because infrastructure changes are hard to test locally and easy to break globally.
The Mikado Process
Here is the step-by-step process:
1. Define Your Goal
Write down exactly what you want to achieve. Be specific:
Goal: Upgrade Kubernetes cluster from 1.28 to 1.30
2. Timebox Your Attempt
Set a timer for 10 to 15 minutes. Try to achieve your goal.
3. When You Fail (And You Will)
This is the crucial part. When you hit an obstacle:
- Revert everything. Use
git checkout .orgit stash. No exceptions. - Document the obstacle as a subgoal.
- Connect it to your main goal on paper or a diagram.
4. Work the Leaves First
Your dependency graph will look like a tree. Always work on the "leaves" first, the subgoals that have no dependencies themselves.
5. Commit After Each Success
Every completed subgoal gets its own commit. This means:
- You can stop anytime with working code
- You can create incremental PRs
- You have clear rollback points
Real-World Example: Terraform Provider Upgrade
Let us walk through upgrading the AWS Terraform provider from 4.x to 5.x.
Attempt 1: Direct Upgrade
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
Result: 47 errors. Timer ran out. Revert.
Subgoals identified:
- Fix deprecated
aws_s3_bucketarguments - Update IAM policy document syntax
- Handle removed data sources
Attempt 2: Fix S3 Bucket Arguments
Focus on just the S3 changes. Timer: 10 minutes.
# Before
resource "aws_s3_bucket" "logs" {
bucket = "my-logs"
acl = "private"
}
# After
resource "aws_s3_bucket" "logs" {
bucket = "my-logs"
}
resource "aws_s3_bucket_acl" "logs" {
bucket = aws_s3_bucket.logs.id
acl = "private"
}
Result: Success on 3 of 12 buckets. Commit. Continue with remaining buckets.
This process continues until all subgoals are complete, making the final provider upgrade trivial.
Mikado for Kubernetes Migrations
The method works exceptionally well for Kubernetes upgrades:
Main Goal: Migrate from Deployment to Argo Rollouts
├── Subgoal: Install Argo Rollouts controller
│ └── Subgoal: Add Helm repo and values
├── Subgoal: Update CI pipeline to build Rollout manifests
├── Subgoal: Create canary AnalysisTemplate
│ └── Subgoal: Set up Prometheus metrics query
└── Subgoal: Convert first non-critical service
└── Subgoal: Add traffic splitting annotations
Each subgoal gets its own PR, its own tests, its own deployment window.
Tools to Support the Method
While pen and paper work fine, these tools help:
- Miro or Excalidraw for visual dependency graphs
- GitHub Issues with task lists for tracking subgoals
- Git worktrees for parallel attempts without stashing
Quick tip: Use a dedicated branch for each attempt:
git checkout -b mikado/upgrade-k8s-1.30-attempt-3
# ... try things ...
git checkout main # revert by switching branches
Common Mistakes to Avoid
Not reverting: The sunk cost fallacy is strong. You spent 2 hours on those changes. Revert anyway.
Timeboxes too long: 30+ minute timeboxes lead to complex changes that are painful to revert.
Skipping documentation: Write down every obstacle. Your future self needs that dependency graph.
Working on non-leaves: Always start with subgoals that have no prerequisites.
When to Use Mikado
The Mikado Method shines when:
- You are upgrading major dependencies
- Refactoring shared infrastructure modules
- Migrating between technologies (e.g., Docker Compose to Kubernetes)
- Making breaking changes to APIs consumed by multiple services
It is overkill for:
- Simple bug fixes
- Adding new, isolated features
- Changes with clear, linear steps
Quick Reference
┌─────────────────────────────────────────┐
│ MIKADO METHOD CHEATSHEET │
├─────────────────────────────────────────┤
│ 1. Write down main goal │
│ 2. Set 10-minute timer │
│ 3. Attempt the goal │
│ 4. Failed? REVERT, note subgoal │
│ 5. Succeeded? COMMIT, check off goal │
│ 6. Repeat from leaves until done │
└─────────────────────────────────────────┘
Key Takeaways
- Embrace failure as learning. Each failed attempt reveals hidden dependencies.
- Revert religiously. Your ability to undo is your superpower.
- Commit incrementally. Every subgoal commit is progress you cannot lose.
- Visualize the graph. Seeing dependencies prevents working on the wrong things.
The Mikado Method transforms terrifying migrations into predictable, boring procedures. And in SRE, boring is exactly what we want.
Akmatori helps SRE teams automate complex infrastructure changes with AI-powered agents that understand your systems. Learn how we make large-scale refactoring safer.
