2026-01-23 · AI & Agents

Autonomous Multi-Agent Coordination - Jan 22 2026

Autonomous Multi-Agent Coordination: Music, Video, and Voice

Date: January 22, 2026 Project: Orca Monolith Development

Overview

Coordinated 7 autonomous Claude Code agents across different terminal sessions using a custom itermctl tool, achieving significant progress on music generation, video processing, photo galleries, and voice assistant integration.

Achievements

🎵 HeartMuLa Music Generation

Generated 9 instrumental tracks (30s each) using 3B parameter model on CUDA server
Genres: Electronic dance, ambient, minimal techno, synthwave, trance, industrial, lo-fi, drum & bass, downtempo
API: http://10.1.2.30:8004/generate/prompt (FastAPI with Form data)
All tracks ready in M3U playlist with VLC integration

🎬 YouTube Shorts Creation

Created 3 vertical shorts (1080x1920) optimized for YouTube algorithm:
- "Which transition is best? 1-6" (10.5s) - engagement hook
- "Watch this transition 👀" (6s) - viral showcase
- "The smoothest edit you'll see today" (7.5s) - bold claim
Features: ChromaticSmash, GlitchAssemble, WhipPan, VaporTrail effects
Hooks appear in first 0.3s with red glow for maximum retention

📸 Photo Gallery Implementation

Built ContentController + LiveView for 2,221 Apple Photos
Route: /photos (gallery), /content/:hash (images)
Fixed persistent storage issue: added /home/j/orca-content:/var/orca/content volume mount
Deployed to production (10.1.2.200:4000)

🎨 Shader Analysis

Analyzed 17 Persian-themed shader videos for color vibrancy
Top performers:
- Persian Kaleidoscope (94.7/100)
- Sassanid Lotus (93.5/100)
- Golden Spiral (92.0/100)
Metrics: saturation, diversity, brightness

🗣️ Voice Hub Integration

Integrated Voice Hub into Orca monolith as separate module
Wyoming protocol support for Home Assistant-compatible voice satellites
WebSocket endpoint: /voice_hub/websocket
Core modules:
- Orca.VoiceHub.SpeakerClient - Speaker identification (ECAPA-TDNN)
- Orca.VoiceHub.WhisperClient - Transcription
- Orca.VoiceHub.WavEncoder - Audio processing
Speaker service deployed on CUDA server (10.1.2.30:8006)

🤖 OpenRouter Integration

Switched quote extraction from Anthropic API to OpenRouter
Using Qwen 2.5 72B (open-weights model) for ranking quotes
Successfully tested on 4 categories: insight, surprise, counterintuitive, hottake
Reduced API costs while maintaining quality

Technical Details

Architecture

Orca: Phoenix + LiveView monolith (Elixir)
CUDA Server: GPU-accelerated services (10.1.2.30)
- HeartMuLa (3B music model, port 8004)
- Whisper transcription (port 8001)
- Speaker identification (port 8006)
App Server: Production deployment (10.1.2.200)
- Orca web app (port 4000)
- PostgreSQL (shared-postgres container)

Coordination Tool

Custom itermctl CLI (AppleScript + bash + jq):

itermctl capture-all - Get all session contents
itermctl send [session] [message] - Send instructions to agents
itermctl status-all - Check processing status

Managed sessions:

0:0 - Main Orca (photo gallery)
0:1 - Voice Hub integration
1:2 - Video/YouTube Shorts
1:3 - OpenRouter integration
1:4 - Shader analysis

Challenges & Solutions

Music Playlist

Challenge: HeartMuLa generates quality audio but doesn't follow prompt labels exactly Solution: Labels are descriptive of intent, actual output is model-driven - quality matters more

Photo Storage

Challenge: 12GB of photos vanished on container restart Solution: Added persistent volume mount for /var/orca/content, cleared orphaned DB records

Voice Hub Integration

Challenge: Voice Hub was standalone app with conflicting modules Solution: Integrated into Orca monolith by:

Moving core modules to lib/orca/voice_hub/
Moving web modules to lib/orca_web/voice_hub/
Renaming VoiceHubOrcaWeb → OrcaWeb.VoiceHub
Removing duplicate application/endpoint/telemetry modules

YuE Music Timeouts

Challenge: YuE 7B model times out after 10 minutes with no output Decision: Stick with HeartMuLa (working, fast, quality results)

Next Steps

Re-run photo collector to populate production storage
Upload YouTube Shorts for algorithm testing
Deploy Voice Hub to Mac for local voice assistant
Monitor OpenRouter quote ranking quality vs cost savings

System Health

Local disk: 54% (13GB free)
CUDA server: 80% (219GB free)
HeartMuLa API: Healthy
All agents: Completed or in final stages

Tools Used: Elixir, Phoenix, LiveView, Docker, FFmpeg, Remotion, HeartMuLa, OpenRouter, Qwen 2.5 72B, itermctl

Outcome: Successful multi-agent coordination with 5 major features shipped in parallel across different domains (music, video, photos, voice, AI integration).