2026-01-23 · AI & Agents

Agent Shepherding Playbook

Date: 2026-01-23 Context: Managing multiple autonomous Claude Code agents across iTerm sessions

Overview

Multi-agent coordination pattern where one "shepherd" agent monitors and guides 4-8 autonomous Claude Code agents working on different tasks in parallel.

Tools

Primary tool: itermctl - Custom CLI for programmatic iTerm2 control

itermctl status-all - Get processing state of all sessions
itermctl capture [session] - Get terminal contents
itermctl capture-all - Get all session contents at once
itermctl send [session] [message] - Send plain English instructions

Shepherding Process

1. Status Check (Every 5-10 minutes)

# Quick overview
itermctl status-all

# Detailed check - last 15 lines of each session
itermctl capture-all | jq -r '.captures[] |
  "\n=== \(.session) ===\n" +
  (.content | split("\n") | .[-15:] | join("\n"))'

Look for:

✅ is_processing: false + no user input = Waiting for direction
⚠️ Long-running tasks (check timestamp in status line)
❌ Error messages in recent output
🤔 Agents asking questions or stuck in loops

2. Agent States & Actions

Idle Agent (Waiting for input)

# Give them a new task or close if complete
itermctl send "0:2" "Great work! This task is complete. You can close."

# Or redirect to new work
itermctl send "0:2" "Now work on: [new task description]"

Long-Running Task (5+ minutes)

# Check if stuck
itermctl capture "0:1" | tail -30

# If stuck, interrupt gently
itermctl send "0:1" "That's taking a while. Can you check if it's progressing?
If stuck, try a different approach or ask for help."

Error State

# Provide debugging guidance
itermctl send "0:3" "I see an error with [X]. Try these steps:
1. Check [Y]
2. If that fails, try [Z]
3. Let me know what you find"

Asking Questions

# Answer decisively, provide context
itermctl send "0:4" "Yes, use approach A because [reason].
For the config, set it to [value]. Proceed when ready."

3. Coordination Patterns

Parallel Work - Independent tasks

Photo gallery (0:1)
Video rendering (0:2)
Music generation (0:3)
Voice processing (0:4)

When to use: Tasks don't depend on each other

Sequential Work - Dependencies

Agent 1: Build API → Agent 2: Test API → Agent 3: Deploy API

When to use: Output of one feeds into another

Collaborative Work - Shared resource

Multiple agents working on different features in same codebase
Coordinate via: "Wait for Agent X to finish before editing file Y"

4. Common Issues

Agent Stuck in Loop

itermctl send "0:2" "I notice you're repeating the same action.
Let's try a different approach: [alternative strategy]"

Port/Resource Conflict

itermctl send "0:3" "Use port 4010 instead - port 4000 is taken by production"

Unclear Requirements

# Don't let agents spin - provide clarity
itermctl send "0:1" "The goal is [X]. Approach: [Y].
Any questions before starting?"

Background Task Hanging

# Check task output, consider timeout
itermctl capture "0:4" | grep "background\|running"
# If > 10min with no progress, suggest canceling

5. Session Organization

Naming Convention:

0:0 - Shepherd (this session)
0:1 - 0:4 - Active work sessions
1:0+ - Completed/reference sessions

When to Close:

Task fully complete
Agent is idle with no follow-up work
After confirming success with user

When to Keep Open:

May need follow-up work
Valuable context for future tasks
Currently processing

6. Communication Style

With Agents (via itermctl send):

Clear, directive language
Provide context and rationale
One message = one focused instruction
Don't micromanage - trust them to execute

With User:

Summarize overall progress
Flag blockers or decisions needed
Show agent outputs when relevant
Update every major milestone

7. Metrics to Track

Active agents: 4-8 (sweet spot)
Check frequency: Every 5-10 minutes
Average task duration: 10-30 minutes
Completion rate: Track tasks finished vs started
Coordination overhead: < 20% of shepherd's time

Example Coordination Session

# Morning: Spin up 5 agents
Session 0:1: "Build photo gallery for 2,221 Apple Photos"
Session 0:2: "Create 3 YouTube Shorts with viral effects"
Session 0:3: "Integrate OpenRouter API for quote ranking"
Session 0:4: "Generate 9 music tracks with HeartMuLa"
Session 1:0: "Analyze 17 shader videos by color vibrancy"

# 10 min check: All progressing
# 20 min: 1:0 complete, 0:2 asking about video length
itermctl send "0:2" "30-60 seconds, vertical format, hooks in first 2s"

# 30 min: 0:4 complete (music), 0:3 blocked on API keys
itermctl send "0:3" "API key is in Infisical. Run:
infisical secrets get OPENROUTER_API_KEY --projectId=... --plain"

# 45 min: 0:1 and 0:2 complete, 0:3 unblocked and finishing
# Close completed sessions, 0:3 wrapping up

# Result: 5 features shipped in 45 minutes

Tips for Success

Trust the agents - They're autonomous, don't micromanage
Clear initial instructions - Good start = good finish
Check regularly - 5-10 min cadence prevents issues
Decisive guidance - When they ask, answer quickly
Parallel > Sequential - Max throughput with independence
Know when to stop - Diminishing returns after 60-90 min
Document patterns - Build playbooks for common scenarios

Anti-Patterns

❌ Checking every 30 seconds - Let agents work ❌ Vague instructions - "Make it better" → agents spin ❌ Too many agents - >8 becomes overhead ❌ All sequential - Loses parallelism benefit ❌ No coordination - Agents conflict on shared resources ❌ Ignoring questions - Agents get stuck waiting

Success Criteria

✅ Multiple features shipped in parallel ✅ Agents mostly autonomous (< 3 interventions per agent) ✅ Clean handoffs between dependent work ✅ User gets regular progress updates ✅ All agents complete within planned timeframe

Key Insight: The shepherd's job isn't to do the work - it's to keep work flowing by providing clarity, unblocking issues, and coordinating across agents.