2026-01-23 · AI & Agents

Agent Shepherding Playbook

Date: 2026-01-23 Context: Managing multiple autonomous Claude Code agents across iTerm sessions

Overview

Multi-agent coordination pattern where one "shepherd" agent monitors and guides 4-8 autonomous Claude Code agents working on different tasks in parallel.

Tools

Primary tool: itermctl - Custom CLI for programmatic iTerm2 control

Shepherding Process

1. Status Check (Every 5-10 minutes)

# Quick overview
itermctl status-all

# Detailed check - last 15 lines of each session
itermctl capture-all | jq -r '.captures[] |
  "\n=== \(.session) ===\n" +
  (.content | split("\n") | .[-15:] | join("\n"))'

Look for:

2. Agent States & Actions

Idle Agent (Waiting for input)

# Give them a new task or close if complete
itermctl send "0:2" "Great work! This task is complete. You can close."

# Or redirect to new work
itermctl send "0:2" "Now work on: [new task description]"

Long-Running Task (5+ minutes)

# Check if stuck
itermctl capture "0:1" | tail -30

# If stuck, interrupt gently
itermctl send "0:1" "That's taking a while. Can you check if it's progressing?
If stuck, try a different approach or ask for help."

Error State

# Provide debugging guidance
itermctl send "0:3" "I see an error with [X]. Try these steps:
1. Check [Y]
2. If that fails, try [Z]
3. Let me know what you find"

Asking Questions

# Answer decisively, provide context
itermctl send "0:4" "Yes, use approach A because [reason].
For the config, set it to [value]. Proceed when ready."

3. Coordination Patterns

Parallel Work - Independent tasks

When to use: Tasks don't depend on each other

Sequential Work - Dependencies

When to use: Output of one feeds into another

Collaborative Work - Shared resource

4. Common Issues

Agent Stuck in Loop

itermctl send "0:2" "I notice you're repeating the same action.
Let's try a different approach: [alternative strategy]"

Port/Resource Conflict

itermctl send "0:3" "Use port 4010 instead - port 4000 is taken by production"

Unclear Requirements

# Don't let agents spin - provide clarity
itermctl send "0:1" "The goal is [X]. Approach: [Y].
Any questions before starting?"

Background Task Hanging

# Check task output, consider timeout
itermctl capture "0:4" | grep "background\|running"
# If > 10min with no progress, suggest canceling

5. Session Organization

Naming Convention:

When to Close:

When to Keep Open:

6. Communication Style

With Agents (via itermctl send):

With User:

7. Metrics to Track

Example Coordination Session

# Morning: Spin up 5 agents
Session 0:1: "Build photo gallery for 2,221 Apple Photos"
Session 0:2: "Create 3 YouTube Shorts with viral effects"
Session 0:3: "Integrate OpenRouter API for quote ranking"
Session 0:4: "Generate 9 music tracks with HeartMuLa"
Session 1:0: "Analyze 17 shader videos by color vibrancy"

# 10 min check: All progressing
# 20 min: 1:0 complete, 0:2 asking about video length
itermctl send "0:2" "30-60 seconds, vertical format, hooks in first 2s"

# 30 min: 0:4 complete (music), 0:3 blocked on API keys
itermctl send "0:3" "API key is in Infisical. Run:
infisical secrets get OPENROUTER_API_KEY --projectId=... --plain"

# 45 min: 0:1 and 0:2 complete, 0:3 unblocked and finishing
# Close completed sessions, 0:3 wrapping up

# Result: 5 features shipped in 45 minutes

Tips for Success

  1. Trust the agents - They're autonomous, don't micromanage
  2. Clear initial instructions - Good start = good finish
  3. Check regularly - 5-10 min cadence prevents issues
  4. Decisive guidance - When they ask, answer quickly
  5. Parallel > Sequential - Max throughput with independence
  6. Know when to stop - Diminishing returns after 60-90 min
  7. Document patterns - Build playbooks for common scenarios

Anti-Patterns

Checking every 30 seconds - Let agents work ❌ Vague instructions - "Make it better" → agents spin ❌ Too many agents - >8 becomes overhead ❌ All sequential - Loses parallelism benefit ❌ No coordination - Agents conflict on shared resources ❌ Ignoring questions - Agents get stuck waiting

Success Criteria

✅ Multiple features shipped in parallel ✅ Agents mostly autonomous (< 3 interventions per agent) ✅ Clean handoffs between dependent work ✅ User gets regular progress updates ✅ All agents complete within planned timeframe


Key Insight: The shepherd's job isn't to do the work - it's to keep work flowing by providing clarity, unblocking issues, and coordinating across agents.