OctoStore for agent orchestration

OctoStore is the runtime coordination layer for agents with shared skills. It gives multi-agent systems one owner per job, visible ownership metadata, and automatic recovery when workers disappear.

Core message: Skills provide capabilities. The marketplace distributes them. OctoStore coordinates execution.

The problem

Once multiple agents can see the same queue and run the same skill, duplicate work becomes the default failure mode.

  • two agents pick up the same task
  • nobody can tell who owns a job right now
  • retries collide with active work
  • a crashed worker leaves work looking busy forever
  • teams overcompensate with heavyweight orchestration

OctoStore solves that with simple shared work ownership.

The stack story

Skills

Reusable capabilities for what agents can do: triage issues, review PRs, generate docs, run analytics, deploy systems.

Marketplace

Discovery and distribution for those capabilities: install, share, version, and reuse skills across teams.

OctoStore

Runtime coordination for who owns the work right now: claim, heartbeat, inspect ownership, release, recover on failure.

The product sentence

OctoStore gives distributed workers and agent swarms one owner per job, visible ownership metadata, and automatic recovery when a worker disappears.

This is the right abstraction level. Users do not want a lecture on leases and lock semantics. They want to know who owns the job, whether that ownership is still live, and how the system recovers if a worker dies.

The example to use everywhere

Five agents can all run the same skill on the same GitHub issue. Only one should own the job.

  1. Each agent tries to claim the same stable job name.
  2. One claim succeeds.
  3. The others can inspect the owner metadata.
  4. The owner heartbeats while working.
  5. If the owner disappears, the claim expires.
  6. Another worker can take over.

Recommended mental model

Expose job semantics, not lock semantics.

  • claim job
  • heartbeat claim
  • inspect current owner
  • release job
  • recover automatically on failure

The underlying implementation can still be sessions, leases, and fencing tokens.

API shape

The existing API already supports this pattern. One practical wrapper layer would look like:

claim_job(job_id, owner, metadata, ttl=120)
heartbeat_job(job_id, lease_id)
release_job(job_id, lease_id)
get_job_owner(job_id)
list_claimed_jobs(prefix=None)

That keeps the implementation grounded in OctoStore’s lock semantics while presenting a clearer job-oriented interface.

What OctoStore is not

  • not a giant workflow engine
  • not a scheduler replacement
  • not a DAG authoring system
  • not a generic “platform for everything”

It is the runtime coordination layer.