2026-01-23 · AI & Agents
Autonomous Multi-Agent Coordination - Jan 22 2026
Autonomous Multi-Agent Coordination: Music, Video, and Voice
Date: January 22, 2026 Project: Orca Monolith Development
Overview
Coordinated 7 autonomous Claude Code agents across different terminal sessions using a custom itermctl tool, achieving significant progress on music generation, video processing, photo galleries, and voice assistant integration.
Achievements
🎵 HeartMuLa Music Generation
- Generated 9 instrumental tracks (30s each) using 3B parameter model on CUDA server
- Genres: Electronic dance, ambient, minimal techno, synthwave, trance, industrial, lo-fi, drum & bass, downtempo
- API:
http://10.1.2.30:8004/generate/prompt(FastAPI with Form data) - All tracks ready in M3U playlist with VLC integration
🎬 YouTube Shorts Creation
- Created 3 vertical shorts (1080x1920) optimized for YouTube algorithm:
- "Which transition is best? 1-6" (10.5s) - engagement hook
- "Watch this transition 👀" (6s) - viral showcase
- "The smoothest edit you'll see today" (7.5s) - bold claim
- Features: ChromaticSmash, GlitchAssemble, WhipPan, VaporTrail effects
- Hooks appear in first 0.3s with red glow for maximum retention
📸 Photo Gallery Implementation
- Built ContentController + LiveView for 2,221 Apple Photos
- Route:
/photos(gallery),/content/:hash(images) - Fixed persistent storage issue: added
/home/j/orca-content:/var/orca/contentvolume mount - Deployed to production (10.1.2.200:4000)
🎨 Shader Analysis
- Analyzed 17 Persian-themed shader videos for color vibrancy
- Top performers:
- Persian Kaleidoscope (94.7/100)
- Sassanid Lotus (93.5/100)
- Golden Spiral (92.0/100)
- Metrics: saturation, diversity, brightness
🗣️ Voice Hub Integration
- Integrated Voice Hub into Orca monolith as separate module
- Wyoming protocol support for Home Assistant-compatible voice satellites
- WebSocket endpoint:
/voice_hub/websocket - Core modules:
Orca.VoiceHub.SpeakerClient- Speaker identification (ECAPA-TDNN)Orca.VoiceHub.WhisperClient- TranscriptionOrca.VoiceHub.WavEncoder- Audio processing
- Speaker service deployed on CUDA server (10.1.2.30:8006)
🤖 OpenRouter Integration
- Switched quote extraction from Anthropic API to OpenRouter
- Using Qwen 2.5 72B (open-weights model) for ranking quotes
- Successfully tested on 4 categories: insight, surprise, counterintuitive, hottake
- Reduced API costs while maintaining quality
Technical Details
Architecture
- Orca: Phoenix + LiveView monolith (Elixir)
- CUDA Server: GPU-accelerated services (10.1.2.30)
- HeartMuLa (3B music model, port 8004)
- Whisper transcription (port 8001)
- Speaker identification (port 8006)
- App Server: Production deployment (10.1.2.200)
- Orca web app (port 4000)
- PostgreSQL (shared-postgres container)
Coordination Tool
Custom itermctl CLI (AppleScript + bash + jq):
itermctl capture-all- Get all session contentsitermctl send [session] [message]- Send instructions to agentsitermctl status-all- Check processing status
Managed sessions:
- 0:0 - Main Orca (photo gallery)
- 0:1 - Voice Hub integration
- 1:2 - Video/YouTube Shorts
- 1:3 - OpenRouter integration
- 1:4 - Shader analysis
Challenges & Solutions
Music Playlist
Challenge: HeartMuLa generates quality audio but doesn't follow prompt labels exactly Solution: Labels are descriptive of intent, actual output is model-driven - quality matters more
Photo Storage
Challenge: 12GB of photos vanished on container restart Solution: Added persistent volume mount for /var/orca/content, cleared orphaned DB records
Voice Hub Integration
Challenge: Voice Hub was standalone app with conflicting modules Solution: Integrated into Orca monolith by:
- Moving core modules to
lib/orca/voice_hub/ - Moving web modules to
lib/orca_web/voice_hub/ - Renaming
VoiceHubOrcaWeb→OrcaWeb.VoiceHub - Removing duplicate application/endpoint/telemetry modules
YuE Music Timeouts
Challenge: YuE 7B model times out after 10 minutes with no output Decision: Stick with HeartMuLa (working, fast, quality results)
Next Steps
- Re-run photo collector to populate production storage
- Upload YouTube Shorts for algorithm testing
- Deploy Voice Hub to Mac for local voice assistant
- Monitor OpenRouter quote ranking quality vs cost savings
System Health
- Local disk: 54% (13GB free)
- CUDA server: 80% (219GB free)
- HeartMuLa API: Healthy
- All agents: Completed or in final stages
Tools Used: Elixir, Phoenix, LiveView, Docker, FFmpeg, Remotion, HeartMuLa, OpenRouter, Qwen 2.5 72B, itermctl
Outcome: Successful multi-agent coordination with 5 major features shipped in parallel across different domains (music, video, photos, voice, AI integration).