Skip to main content
02.03.2026

The Mikado Method: Safe Refactoring for Complex Infrastructure

head-image

Every SRE has faced this situation: you need to upgrade a critical dependency, refactor a sprawling Terraform codebase, or migrate services to a new architecture. You start making changes, and within hours you are drowning in cascading failures. The code does not compile, tests are failing everywhere, and you have not committed anything in days.

This is the quicksand of complex systems. The Mikado Method is your escape route.

What is the Mikado Method?

The Mikado Method is a structured approach to making large, risky changes in complex codebases. Named after the pick-up sticks game, it helps you identify dependencies between changes and tackle them in the right order.

The core principle is simple: try, fail, learn, revert, repeat.

Instead of pushing through when you hit obstacles, you deliberately revert your changes, document what you learned, and address prerequisites first.

Why DevOps Engineers Need This

Infrastructure code has unique challenges:

  • Terraform modules with dozens of interdependencies
  • Kubernetes manifests where one change cascades through deployments
  • CI/CD pipelines with complex conditional logic
  • Configuration management code touching multiple environments

Traditional "just fix it" approaches fail spectacularly here because infrastructure changes are hard to test locally and easy to break globally.

The Mikado Process

Here is the step-by-step process:

1. Define Your Goal

Write down exactly what you want to achieve. Be specific:

Goal: Upgrade Kubernetes cluster from 1.28 to 1.30

2. Timebox Your Attempt

Set a timer for 10 to 15 minutes. Try to achieve your goal.

3. When You Fail (And You Will)

This is the crucial part. When you hit an obstacle:

  1. Revert everything. Use git checkout . or git stash. No exceptions.
  2. Document the obstacle as a subgoal.
  3. Connect it to your main goal on paper or a diagram.

4. Work the Leaves First

Your dependency graph will look like a tree. Always work on the "leaves" first, the subgoals that have no dependencies themselves.

5. Commit After Each Success

Every completed subgoal gets its own commit. This means:

  • You can stop anytime with working code
  • You can create incremental PRs
  • You have clear rollback points

Real-World Example: Terraform Provider Upgrade

Let us walk through upgrading the AWS Terraform provider from 4.x to 5.x.

Attempt 1: Direct Upgrade

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

Result: 47 errors. Timer ran out. Revert.

Subgoals identified:

  • Fix deprecated aws_s3_bucket arguments
  • Update IAM policy document syntax
  • Handle removed data sources

Attempt 2: Fix S3 Bucket Arguments

Focus on just the S3 changes. Timer: 10 minutes.

# Before
resource "aws_s3_bucket" "logs" {
  bucket = "my-logs"
  acl    = "private"
}

# After
resource "aws_s3_bucket" "logs" {
  bucket = "my-logs"
}

resource "aws_s3_bucket_acl" "logs" {
  bucket = aws_s3_bucket.logs.id
  acl    = "private"
}

Result: Success on 3 of 12 buckets. Commit. Continue with remaining buckets.

This process continues until all subgoals are complete, making the final provider upgrade trivial.

Mikado for Kubernetes Migrations

The method works exceptionally well for Kubernetes upgrades:

Main Goal: Migrate from Deployment to Argo Rollouts
├── Subgoal: Install Argo Rollouts controller
│   └── Subgoal: Add Helm repo and values
├── Subgoal: Update CI pipeline to build Rollout manifests
├── Subgoal: Create canary AnalysisTemplate
│   └── Subgoal: Set up Prometheus metrics query
└── Subgoal: Convert first non-critical service
    └── Subgoal: Add traffic splitting annotations

Each subgoal gets its own PR, its own tests, its own deployment window.

Tools to Support the Method

While pen and paper work fine, these tools help:

  • Miro or Excalidraw for visual dependency graphs
  • GitHub Issues with task lists for tracking subgoals
  • Git worktrees for parallel attempts without stashing

Quick tip: Use a dedicated branch for each attempt:

git checkout -b mikado/upgrade-k8s-1.30-attempt-3
# ... try things ...
git checkout main  # revert by switching branches

Common Mistakes to Avoid

Not reverting: The sunk cost fallacy is strong. You spent 2 hours on those changes. Revert anyway.

Timeboxes too long: 30+ minute timeboxes lead to complex changes that are painful to revert.

Skipping documentation: Write down every obstacle. Your future self needs that dependency graph.

Working on non-leaves: Always start with subgoals that have no prerequisites.

When to Use Mikado

The Mikado Method shines when:

  • You are upgrading major dependencies
  • Refactoring shared infrastructure modules
  • Migrating between technologies (e.g., Docker Compose to Kubernetes)
  • Making breaking changes to APIs consumed by multiple services

It is overkill for:

  • Simple bug fixes
  • Adding new, isolated features
  • Changes with clear, linear steps

Quick Reference

┌─────────────────────────────────────────┐
│         MIKADO METHOD CHEATSHEET        │
├─────────────────────────────────────────┤
│ 1. Write down main goal                 │
│ 2. Set 10-minute timer                  │
│ 3. Attempt the goal                     │
│ 4. Failed? REVERT, note subgoal         │
│ 5. Succeeded? COMMIT, check off goal    │
│ 6. Repeat from leaves until done        │
└─────────────────────────────────────────┘

Key Takeaways

  • Embrace failure as learning. Each failed attempt reveals hidden dependencies.
  • Revert religiously. Your ability to undo is your superpower.
  • Commit incrementally. Every subgoal commit is progress you cannot lose.
  • Visualize the graph. Seeing dependencies prevents working on the wrong things.

The Mikado Method transforms terrifying migrations into predictable, boring procedures. And in SRE, boring is exactly what we want.


Akmatori helps SRE teams automate complex infrastructure changes with AI-powered agents that understand your systems. Learn how we make large-scale refactoring safer.

Automate incident response and prevent on-call burnout with AI-driven agents!