2026-01-27 · AI & Agents
Building Coordinated Intelligence: Daily Extraction with Proportional Allocation
The Problem
Knowledge extraction systems face a fundamental tension: extract too aggressively and you overwhelm downstream systems, extract too conservatively and insights arrive too slowly. Manual tuning per extractor is brittle - data sources grow at different rates, and hardcoded limits quickly become obsolete.
The Solution: Proportional Allocation
We implemented a daily extraction worker that solves this elegantly:
- Scan Phase: Query all extractors to estimate available unprocessed data
- Allocation Phase: Distribute a daily budget (300 entities) proportionally across sources
- Execution Phase: Each extractor runs with its calculated limit
# Step 1: See what's available
scan_results = Enum.map(@extractors, &scan_extractor/1)
# Step 2: Calculate fair share
total_available = Enum.sum(available_counts)
allocations = Enum.map(available, fn {name, count, module} ->
allocated = max(1, round(count / total_available * daily_limit))
{name, allocated, module}
end)
# Step 3: Extract proportionally
Enum.each(allocations, fn {name, limit, module} ->
run_single_extractor(module, name, limit)
end)
This ensures:
- Fast-growing sources (like iMessage) get proportionally more entities extracted
- Slow sources still get at least 1 entity per day
- Total load stays predictable for downstream processing
- No manual tuning required
Concurrent Agent Execution
We also established remote gateway infrastructure for concurrent Claude agent execution:
Architecture:
- ClawdBot gateway on cuda.local (dual RTX 3090s)
- LiteLLM proxy routing to local llamacpp/GLM-4.7
- OAuth-style device pairing for secure remote access
- 4 concurrent main agents + 8 concurrent subagents
Why This Matters:
- Free GPU inference (no API costs)
- Multiple autonomous agents working in parallel
- Each agent can spawn subagents for complex tasks
- All coordinated through a single WebSocket gateway
Systems Thinking
The key insight: coordination beats optimization.
Rather than perfectly tuning each extractor, we created a system where extractors coordinate through a shared budget. The allocation algorithm automatically adapts as data sources grow or shrink.
Rather than running sequential agent tasks, we built infrastructure for parallel execution. The gateway handles coordination while agents work autonomously.
Both solutions share the same principle: define boundaries, distribute resources fairly, let components self-organize.
Technical Stack
- Oban: Background job processing with cron scheduling
- ClawdBot: Multi-agent orchestration with remote gateway
- LiteLLM: Model routing and API compatibility layer
- llamacpp: Local GPU inference (dual RTX 3090s)
- Elixir: Concurrent, fault-tolerant coordination layer
Results
- Daily extraction runs at 3am, processing 300 entities proportionally
- Systematic scanner (
scripts/scan_and_extract.exs) for visibility into available data - Remote gateway enables parallel agent execution from any device
- Zero API costs for LLM inference via local GPU
- All extractors get fair resource allocation automatically
What's Next
- Cross-check concurrency settings across cuda → litellm → llamacpp
- Monitor extraction patterns to validate proportional allocation
- Explore emergent behaviors from parallel agent coordination
- Scale daily limit based on downstream processing capacity
Built on v0.1.0-pre-integration - the tag before everything got wild.