2026-01-27 · AI & Agents

Building Coordinated Intelligence: Daily Extraction with Proportional Allocation

The Problem

Knowledge extraction systems face a fundamental tension: extract too aggressively and you overwhelm downstream systems, extract too conservatively and insights arrive too slowly. Manual tuning per extractor is brittle - data sources grow at different rates, and hardcoded limits quickly become obsolete.

The Solution: Proportional Allocation

We implemented a daily extraction worker that solves this elegantly:

Scan Phase: Query all extractors to estimate available unprocessed data
Allocation Phase: Distribute a daily budget (300 entities) proportionally across sources
Execution Phase: Each extractor runs with its calculated limit

# Step 1: See what's available
scan_results = Enum.map(@extractors, &scan_extractor/1)

# Step 2: Calculate fair share
total_available = Enum.sum(available_counts)
allocations = Enum.map(available, fn {name, count, module} ->
  allocated = max(1, round(count / total_available * daily_limit))
  {name, allocated, module}
end)

# Step 3: Extract proportionally
Enum.each(allocations, fn {name, limit, module} ->
  run_single_extractor(module, name, limit)
end)

This ensures:

Fast-growing sources (like iMessage) get proportionally more entities extracted
Slow sources still get at least 1 entity per day
Total load stays predictable for downstream processing
No manual tuning required

Concurrent Agent Execution

We also established remote gateway infrastructure for concurrent Claude agent execution:

Architecture:

ClawdBot gateway on cuda.local (dual RTX 3090s)
LiteLLM proxy routing to local llamacpp/GLM-4.7
OAuth-style device pairing for secure remote access
4 concurrent main agents + 8 concurrent subagents

Why This Matters:

Free GPU inference (no API costs)
Multiple autonomous agents working in parallel
Each agent can spawn subagents for complex tasks
All coordinated through a single WebSocket gateway

Systems Thinking

The key insight: coordination beats optimization.

Rather than perfectly tuning each extractor, we created a system where extractors coordinate through a shared budget. The allocation algorithm automatically adapts as data sources grow or shrink.

Rather than running sequential agent tasks, we built infrastructure for parallel execution. The gateway handles coordination while agents work autonomously.

Both solutions share the same principle: define boundaries, distribute resources fairly, let components self-organize.

Technical Stack

Oban: Background job processing with cron scheduling
ClawdBot: Multi-agent orchestration with remote gateway
LiteLLM: Model routing and API compatibility layer
llamacpp: Local GPU inference (dual RTX 3090s)
Elixir: Concurrent, fault-tolerant coordination layer

Results

Daily extraction runs at 3am, processing 300 entities proportionally
Systematic scanner (scripts/scan_and_extract.exs) for visibility into available data
Remote gateway enables parallel agent execution from any device
Zero API costs for LLM inference via local GPU
All extractors get fair resource allocation automatically

What's Next

Cross-check concurrency settings across cuda → litellm → llamacpp
Monitor extraction patterns to validate proportional allocation
Explore emergent behaviors from parallel agent coordination
Scale daily limit based on downstream processing capacity

Built on v0.1.0-pre-integration - the tag before everything got wild.